如何通过do函数在特定列中拆分不同数量的字符串

本文介绍了如何通过do函数在特定列中拆分不同数量的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当列元素具有不同数量的字符串时，我遇到拆分列值的问题.我可以在 plyr 中做到这一点，例如:

库(plyr)列 <- c("jake", "jane jane","john john john")df <- data.frame(1:3, name = column)df$name <- as.character(df$name)df2

因此，我们的数据框的列数与给定元素中的最大字符串数相关.

当我尝试在 dplyr 中执行此操作时，我使用了 do 函数:

库(dplyr)df2 <-df%>%do(data.frame(strsplit(.$name, " ")))

但我收到一个错误:

data.frame("jake", c("jane", "jane"), c("john", "john", "john") 中的错误:参数意味着不同的行数:1、2、3

在我看来应该使用 rbind 函数，但我不知道在哪里.

解决方案

你遇到了麻烦，因为 strsplit() 返回一个列表，然后我们需要应用 as.data.frame.list() 到每个元素以使其成为 dplyr 所需的正确格式.即便如此，它仍然需要更多的工作才能获得可用的结果.长话短说，它似乎不适合 do() 的操作.

我认为您最好使用 tidyr 中的 separate().它可以很容易地与 dplyr 函数和链一起使用.不清楚是否要保留第一列，因为 df2 的 ldply 结果没有它，所以我把它去掉了.

库(tidyr)分离(df [-1]，名称，1:3，"，额外=合并")# 1 2 3#1 杰克<NA><不适用># 2 jane jane <NA># 3 约翰约翰

您也可以使用 cSplit.它也非常高效，因为它依赖于 data.table

库(splitstackshape)cSplit(df[-1], "name", " ")# name_1 name_2 name_3#1:杰克NA NA#2:简简NA#3:约翰约翰约翰

或者更具体地说

setnames(df2

I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:

library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)

As a result, we have data frame with number of column related to maximum number of stings in given element.

When I try to do it in dplyr, I used do function:

library(dplyr)
df2 <- df %>%
  do(data.frame(strsplit(.$name, " ")))

but I get an error:

It seems to me that it should be used rbind function but I do not know where.

解决方案

You're having troubles because strsplit() returns a list which we then need to apply as.data.frame.list() to each element to get it into the proper format that dplyr requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do().

I think you might be better off using separate() from tidyr. It can easily be used with dplyr functions and chains. It's not clear whether you want to keep the first column since your ldply result for df2 does not have it, so I left it off.

library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
#      1    2    3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john

You could also use cSplit. It is also very efficient since it relies on data.table

library(splitstackshape)
cSplit(df[-1], "name", " ")
#    name_1 name_2 name_3
# 1:   jake     NA     NA
# 2:   jane   jane     NA
# 3:   john   john   john

Or more specifically

setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
#       1    2    3
# 1: jake   NA   NA
# 2: jane jane   NA
# 3: john john john

这篇关于如何通过do函数在特定列中拆分不同数量的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！