本文介绍了在data.table中获取用户定义的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何在data.table中传递用户定义的函数。



我使用data.table创建了以下代码,用于计算两个组中所有有效响应('a'或'b')中响应'b'的百分比; grp1和grp2:

数据(带有警告信息):

 <$ c $ (c(I,II,III,IV)),rep(c(A,B (c)(a,a,b,b,b) (grp1,grp2,Q1)

计算%respondents的代码:

  

问题出在你用参数指定的方式。我们还可以使用 keyby 来代替,,可以一步完成排序:

  test =函数(question,groupA,groupB){
dt [,sum(get(question)%in%b)/ sum (!is.na(get(question)))* 100,
keyby = c(groupA,groupB)]
}

ans = test(question =Q1 ,groupA =grp1,groupB =grp2)
#grp1 grp2 V1
#1:IA 55.55556
#2:IB 62.50000
#3:IC 62.50000
#4:II A 62.50000
#5:II B 55.55556
#6:II C 62.50000
#7:III A 50.00000
#8:III B 62.50000
#9:III C 66.66667
#10:IV A 66.66667
#11:IV B 62.50000
#12:IV C 50.00000


I would like to know how to pass a user-defined function in a data.table.

I created the following code using data.table to calculate % of responses 'b' out of all valid responses ('a' or 'b') by two groups; grp1 and grp2:

The data (with a warning message):

library(data.table)
dt = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")), 
                rep(c("a", "a", "b", "b", "b"), 20))
colnames(dt) = c("grp1", "grp2", "Q1")

The code to calculate % respondents:

dt[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2][order(grp1, grp2)]

This produces what I need (thanks @Frank your help at Calculate % respondents by more than one group for a survey data):

    grp1 grp2       V1
 1:    I    A 55.55556
 2:    I    B 62.50000
 3:    I    C 62.50000
 4:   II    A 62.50000
 5:   II    B 55.55556
 6:   II    C 62.50000
 7:  III    A 50.00000
 8:  III    B 62.50000
 9:  III    C 66.66667
10:   IV    A 66.66667
11:   IV    B 62.50000
12:   IV    C 50.00000

What I would like to do is to create a function and use it to calculate the equivalent set of values for 50 other items. I created the following function hoping to minimize the repetitive process;

test = function(question, groupA, groupB){
  dt[, sum(get(question) %in% "b")/sum(!is.na(get(question)))*100, by = eval((c(groupA, groupB)))][order(groupA, groupB)]
  }

test(question = "Q1", groupA = "grp1", groupB ="grp2")

However, this returns only the top row :

   grp1 grp2       V1
1:    I    A 55.55556

I've read other items on Stack Overflow (e.g. Using data.table i and j arguments in functions) and tried other codes but I haven't been able to find a way to get it work.

I'm new to R and would very much appreciate any feedback you may have.

解决方案

The issue is in the way you specify the by argument. Also we can use keyby instead of by, to do the sorting in one step:

test = function(question, groupA, groupB){
  dt[, sum(get(question) %in% "b") / sum(!is.na(get(question))) * 100, 
    keyby =  c(groupA, groupB)] 
}

ans = test(question = "Q1", groupA = "grp1", groupB ="grp2")
#   grp1  grp2       V1
# 1:   I     A 55.55556
# 2:   I     B 62.50000
# 3:   I     C 62.50000
# 4:  II     A 62.50000
# 5:  II     B 55.55556
# 6:  II     C 62.50000
# 7: III     A 50.00000
# 8: III     B 62.50000
# 9: III     C 66.66667
# 10:  IV     A 66.66667
# 11:  IV     B 62.50000
# 12:  IV     C 50.00000

这篇关于在data.table中获取用户定义的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 12:27