问题描述
当列元素具有不同数量的字符串时,我遇到拆分列值的问题.我可以在 plyr 中做到这一点,例如:
库(plyr)列 <- c("jake", "jane jane","john john john")df <- data.frame(1:3, name = column)df$name <- as.character(df$name)df2
因此,我们的数据框的列数与给定元素中的最大字符串数相关.
当我尝试在 dplyr 中执行此操作时,我使用了 do
函数:
库(dplyr)df2 <-df%>%do(data.frame(strsplit(.$name, " ")))
但我收到一个错误:
data.frame("jake", c("jane", "jane"), c("john", "john", "john") 中的错误:参数意味着不同的行数:1、2、3
在我看来应该使用 rbind
函数,但我不知道在哪里.
你遇到了麻烦,因为 strsplit()
返回一个列表,然后我们需要应用 as.data.frame.list()
到每个元素以使其成为 dplyr
所需的正确格式.即便如此,它仍然需要更多的工作才能获得可用的结果.长话短说,它似乎不适合 do()
的操作.
我认为您最好使用 tidyr
中的 separate()
.它可以很容易地与 dplyr
函数和链一起使用.不清楚是否要保留第一列,因为 df2
的 ldply
结果没有它,所以我把它去掉了.
库(tidyr)分离(df [-1],名称,1:3,",额外=合并")# 1 2 3#1 杰克<NA><不适用># 2 jane jane <NA># 3 约翰约翰
您也可以使用 cSplit
.它也非常高效,因为它依赖于 data.table
库(splitstackshape)cSplit(df[-1], "name", " ")# name_1 name_2 name_3#1:杰克NA NA#2:简简NA#3:约翰约翰约翰
或者更具体地说
setnames(df2
I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:
library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)
As a result, we have data frame with number of column related to maximum number of stings in given element.
When I try to do it in dplyr, I used do
function:
library(dplyr)
df2 <- df %>%
do(data.frame(strsplit(.$name, " ")))
but I get an error:
It seems to me that it should be used rbind
function but I do not know where.
You're having troubles because strsplit()
returns a list which we then need to apply as.data.frame.list()
to each element to get it into the proper format that dplyr
requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do()
.
I think you might be better off using separate()
from tidyr
. It can easily be used with dplyr
functions and chains. It's not clear whether you want to keep the first column since your ldply
result for df2
does not have it, so I left it off.
library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
# 1 2 3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john
You could also use cSplit
. It is also very efficient since it relies on data.table
library(splitstackshape)
cSplit(df[-1], "name", " ")
# name_1 name_2 name_3
# 1: jake NA NA
# 2: jane jane NA
# 3: john john john
Or more specifically
setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
# 1 2 3
# 1: jake NA NA
# 2: jane jane NA
# 3: john john john
这篇关于如何通过do函数在特定列中拆分不同数量的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!