按组汇总并获得不同数据的非NA值的计数，均值和sd.frame列

添加表格这是我拥有的数据这是我从代码中获得的数据这是我想要的表格您会注意到，在具有"数据框中，按治疗组对var1列中不存在的行进行计数会得出以下结果: veh-9图4-83-102-5 但是当使用sum(！is.na(x)时，我得到以下内容 veh-6图4-53-102-5 我认为这是因为该函数同时使用var1和var2来求和非缺失数.我不知道该如何纠正.最好杰克解决方案这是一种data.table方法: 数据您拥有的数据难以读入R中-请使用dput()等使其他人更容易使用> dput(dt)structure(list(someting = c("503", "553", "599", "647", "695", "728", "760", "793", "826", "859", "907", "955", "1003", "1036", "1084", "1131", "1179", "1226", "1274", "1322", "1355", "1402", "1450", "1497", "1545"), treatment = c("gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.4", "gr.4"), var1 = c(8, NA, 3, 3, NA, NA, NA, NA, NA, 8, 8, 8, NA, 8, 8, 8, 8, 8, 8, NA, 8, 8, 8, 8, NA), var2 = c(8L, 8L, 8L, 8L, NA, NA, NA, NA, NA, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, NA)), .Names = c("someting", "treatment", "var1", "var2"), row.names = c(NA, -25L), class = c("data.table", "data.frame")) 代码 dt[, .(var1.n = sum(!is.na(var1)), var2.n = sum(!is.na(var1)), var1.mean = mean(var1, na.rm = T), var2.mean = mean(var2, na.rm = T)), by = .(treatment)] 输出 treatment var1.n var2.n var1.mean var2.mean1: gr.2 5 5 6 82: gr.3 10 10 8 83: gr.4 1 1 8 8由于某些原因，未读入"veh"条目.因此，输出略有不同，但原理应明确. I am having some difficulty counting non-missing values by group through the function below (which also gives sd, and mean):test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x))))It ends up giving me the number of non-missing for all columns in the dataframe instead of just a single column.I have been looking through SO for some advice and found this, this, and this helpful, but I can't figure out why the aggregate with the function(x) would combine some columns for the sum(!is.na(x), but not for the mean or sd.EDIT: Adding tablesThis is the data I haveThis is the data I get from my codeThis is the table I wantYou will notice in the 'have' dataframe that counting the non-mising rows in column var1 by treatment group gives the following:veh - 9gr.4 - 8gr.3 - 10gr.2 - 5But when using the sum(!is.na(x) I get the followingveh - 6gr.4 - 5gr.3 - 10gr.2 - 5I believe this is because the function is using both var1 and var2 to sum the number of non-missing. I do not know how to correct for this.Best,Jack 解决方案 Here's a data.table approach: DATAThe data you have is cumbersome to read into R - please use dput() etc. to make it easier for others:> dput(dt)structure(list(someting = c("503", "553", "599", "647", "695", "728", "760", "793", "826", "859", "907", "955", "1003", "1036", "1084", "1131", "1179", "1226", "1274", "1322", "1355", "1402", "1450", "1497", "1545"), treatment = c("gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.4", "gr.4"), var1 = c(8, NA, 3, 3, NA, NA, NA, NA, NA, 8, 8, 8, NA, 8, 8, 8, 8, 8, 8, NA, 8, 8, 8, 8, NA), var2 = c(8L, 8L, 8L, 8L, NA, NA, NA, NA, NA, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, NA)), .Names = c("someting", "treatment", "var1", "var2"), row.names = c(NA, -25L), class = c("data.table", "data.frame"))CODEdt[, .(var1.n = sum(!is.na(var1)), var2.n = sum(!is.na(var1)), var1.mean = mean(var1, na.rm = T), var2.mean = mean(var2, na.rm = T)), by = .(treatment)]OUTPUT treatment var1.n var2.n var1.mean var2.mean1: gr.2 5 5 6 82: gr.3 10 10 8 83: gr.4 1 1 8 8For some reason the "veh" entries weren't read in. Hence the output is slightly different but the principle ought to be clear. 这篇关于按组汇总并获得不同数据的非NA值的计数，均值和sd.frame列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

GR

按组汇总并获得不同数据的非NA值的计数，均值和sd.frame列