本文介绍了如何将更复杂的函数传递给summarise_if或mutate_if?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模板,用于从源中汇总数据以获取均值和95%的置信度,以便在ggplot中进行绘制(最初是根据多年前的Stack Overflow改编而成的,很抱歉,但我没有不知道原始来源),就像这样:

I have a template that I use to aggregate up my data from its source to get means and 95% confidence levels, in order to plot these in ggplot (originally adapted from a Stack Overflow post many years ago, apologies but I don't know the original source) that looks like:

data %>%
  group_by(var1, var2) %>%
  summarise(count=n(),
            mean.outcome_variable = mean(outcome_variable, na.rm = TRUE),
            sd.outcome_variable = sd(outcome_variable, na.rm = TRUE),
            n.outcome_variable = n(),
            total.outcome_variable = sum(outcome_variable)) %>%
  mutate(se.outcome_variable = sd.outcome_variable / sqrt(n.outcome_variable),
         lower.ci.outcome_variable = mean.outcome_variable - qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable,
         upper.ci.outcome_variable = mean.outcome_variable + qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable)

这对于一个或两个结果变量效果很好,但是复制和粘贴大量结果变量变得不切实际,因此我希望使用summarise_if代替,因为我有大量都是数字的结果变量.但是,我不知道如何在"funs"参数中指定比简单函数(例如"mean"或"sd")更复杂的内容.我已经尝试了gmodels :: ci()如下:

This works well with one or two outcome variables but becomes infeasibly impractical to copy and paste with large numbers of outcome variables, so I was hoping to use summarise_if instead where I have large numbers of outcome variables which are all numeric. However I do not know how to specify anything more complex than a simple function such as "mean" or "sd" in the "funs" argument. I have tried gmodels::ci() as follows:

dataset_aggregated <- data %>%
  group_by(var1, var2) %>%
  summarise_if(is.numeric, funs(mean, lowCI = ci()[2], hiCI = ci()[3])) # does not work without brackets either

但这会导致

Error in summarise_impl(.data, dots) : 
  Evaluation error: no applicable method for 'ci' applied to an object of class "NULL".

我如何使它工作?

推荐答案

我在准备好要发布问题的同时就想出了解决办法,但我想我可以分享一下,以防其他人遇到同样的问题因为答案非常简单,我无法相信我花了这么长时间才想到它.基本上,我只是制作了自定义的lci()和uci()函数,以将结果与gmodels :: ci()分开,并改为调用它们,例如

I worked out how to do this just as I got the question ready to post, but I thought I'd share in case anyone else was having the same issues as the answer is surprisingly simple and I can't believe it took me so long to think of it. Basically I just made custom lci() and uci() functions to separate out the results from gmodels::ci() and called these instead, e.g.

lci <- function(data) {
  as.numeric(ci(data)[2])
}

uci <- function(data) {
  as.numeric(ci(data)[3])
}

dataset_aggregated <- dataset %>%
  group_by(var1, var2) %>% #you can group by however many you want here, just put them in the select statement below
  summarise_if(is.numeric, funs(mean, lci, uci)) %>% 
  select(var1, var2, sort(current_vars())) #sorts columns into lci, mean, uci for each outcome variable alphabetically

这篇关于如何将更复杂的函数传递给summarise_if或mutate_if?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 14:25