本文介绍了使用dplyr对几个变量的所有可能组合进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 考虑到以下情况: library(dplyr) myData var2 = letters [1:3]%>% sample(100,replace = TRUE)%>% factor(), var3 = LETTERS [1:3]%>% sample(100,replace = TRUE)%>% factor(), var4 = month.abb [1 :3]%>% sample(100,replace = TRUE)%>% factor())) 我想组合myData,最终通过var2,var3和var4的所有可能组合查找汇总数据分组。 我可以使用 创建一个包含所有可变变量组合的列表作为字符值。 groupNames< - names(myData)[2:4] myGroups< - Map(combn, list(groupNames), seq_along(groupNames) , simplified = FALSE)%>% unlist(recursive = FALSE) 我的计划是使用for()循环为每个变量组合创建单独的数据集,如 ###这不工作 for(i in 1:length(myGroups)){ assign(myGroups [i]%>% unlist()%>% paste0 (collapse =)%>% paste0(Data), myData%>% group_by_(lapply(myGroups [[i]],as.symbol)) %>%总汇(n =长度(var1), avgVar2 = var2%>% mean()))} 诚然,我对列表不是很好,因为dpyr更新已经改变了分组的工作原理,所以这个问题有点有挑战性。 如果有一个更好的方式来做这个比单独的数据集,我会喜欢知道。 当我只通过一个变量分组时,我已经得到了一个类似于上述工作的循环。 任何和所有的帮助是非常感谢!谢谢!解决方案这似乎很有信心,可能有一种方法可以简化或者花费一个 do ,但它有效。使用您的 myData 和 myGroups , results = lapply(myGroups,FUN = function(x){ do.call(what = group_by_,args = c(list(myData),x))%>%总结(n =长度(var1), avgVar1 =平均值(var1))} ) >结果[[1]] 来源:本地数据框架[3 x 3] var2 n avgVar1 1 a 31 0.38929738 2 b 31 -0.07451717 3 c 38 -0.22522129 >结果[[4]] 来源:本地数据框[9 x 4] 组:var2 var2 var3 n avgVar1 1 a A 11 -0.1159160 2 a B 11 0.5663312 3 a C 9 0.7904056 4 b A 7 0.0856384 5 b B 13 0.1309756 6 b C 11 -0.4192895 7 c A 15 -0.2783099 8 c B 10 -0.1110877 9 c C 13 -0.2517602 >结果[[7]] #我不会将它们粘贴到这里,但它有27行,分组为var2,var3和var4。 我将您的总结调用为平均值 var1 因为 var2 不是数字。 Given a situation such as the followinglibrary(dplyr)myData <- tbl_df(data.frame( var1 = rnorm(100), var2 = letters[1:3] %>% sample(100, replace = TRUE) %>% factor(), var3 = LETTERS[1:3] %>% sample(100, replace = TRUE) %>% factor(), var4 = month.abb[1:3] %>% sample(100, replace = TRUE) %>% factor()))I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4. I can create a list with all possible combinations of variables as character values with groupNames <- names(myData)[2:4]myGroups <- Map(combn, list(groupNames), seq_along(groupNames), simplify = FALSE) %>% unlist(recursive = FALSE)My plan was to make separate data sets for each variable combination with a for() loop, something like### This Does Not Workfor (i in 1:length(myGroups)){ assign( myGroups[i]%>% unlist() %>% paste0(collapse = "")%>% paste0("Data"), myData %>% group_by_(lapply(myGroups[[i]], as.symbol)) %>% summarise( n = length(var1), avgVar2 = var2 %>% mean()))}Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit.If there is a better way to do this than separate data sets I would love to know. I've gotten a loop similar to above working when I am only grouping by a single variable. Any and all help is greatly appreciated! Thank you! 解决方案 This seems convulated, and there's probably a way to simplify or fancy it up with a do, but it works. Using your myData and myGroups, results = lapply(myGroups, FUN = function(x) { do.call(what = group_by_, args = c(list(myData), x)) %>% summarise( n = length(var1), avgVar1 = mean(var1)) })> results[[1]]Source: local data frame [3 x 3] var2 n avgVar11 a 31 0.389297382 b 31 -0.074517173 c 38 -0.22522129> results[[4]]Source: local data frame [9 x 4]Groups: var2 var2 var3 n avgVar11 a A 11 -0.11591602 a B 11 0.56633123 a C 9 0.79040564 b A 7 0.08563845 b B 13 0.13097566 b C 11 -0.41928957 c A 15 -0.27830998 c B 10 -0.11108779 c C 13 -0.2517602> results[[7]]# I won't paste them here, but it has all 27 rows, grouped by var2, var3 and var4.I changed your summarise call to average var1 since var2 isn't numeric. 这篇关于使用dplyr对几个变量的所有可能组合进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-10 20:37