


我使用data.table创建了以下代码,用于计算两个组中所有有效响应('a'或'b')中响应'b'的百分比; grp1和grp2:


 <$ c $ (c(I,II,III,IV)),rep(c(A,B (c)(a,a,b,b,b) (grp1,grp2,Q1)



问题出在你用参数指定的方式。我们还可以使用 keyby 来代替,,可以一步完成排序:

  test =函数(question,groupA,groupB){
dt [,sum(get(question)%in%b)/ sum (!is.na(get(question)))* 100,
keyby = c(groupA,groupB)]

ans = test(question =Q1 ,groupA =grp1,groupB =grp2)
#grp1 grp2 V1
#1:IA 55.55556
#2:IB 62.50000
#3:IC 62.50000
#4:II A 62.50000
#5:II B 55.55556
#6:II C 62.50000
#7:III A 50.00000
#8:III B 62.50000
#9:III C 66.66667
#10:IV A 66.66667
#11:IV B 62.50000
#12:IV C 50.00000

I would like to know how to pass a user-defined function in a data.table.

I created the following code using data.table to calculate % of responses 'b' out of all valid responses ('a' or 'b') by two groups; grp1 and grp2:

The data (with a warning message):

dt = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")), 
                rep(c("a", "a", "b", "b", "b"), 20))
colnames(dt) = c("grp1", "grp2", "Q1")

The code to calculate % respondents:

dt[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2][order(grp1, grp2)]

This produces what I need (thanks @Frank your help at Calculate % respondents by more than one group for a survey data):

    grp1 grp2       V1
 1:    I    A 55.55556
 2:    I    B 62.50000
 3:    I    C 62.50000
 4:   II    A 62.50000
 5:   II    B 55.55556
 6:   II    C 62.50000
 7:  III    A 50.00000
 8:  III    B 62.50000
 9:  III    C 66.66667
10:   IV    A 66.66667
11:   IV    B 62.50000
12:   IV    C 50.00000

What I would like to do is to create a function and use it to calculate the equivalent set of values for 50 other items. I created the following function hoping to minimize the repetitive process;

test = function(question, groupA, groupB){
  dt[, sum(get(question) %in% "b")/sum(!is.na(get(question)))*100, by = eval((c(groupA, groupB)))][order(groupA, groupB)]

test(question = "Q1", groupA = "grp1", groupB ="grp2")

However, this returns only the top row :

   grp1 grp2       V1
1:    I    A 55.55556

I've read other items on Stack Overflow (e.g. Using data.table i and j arguments in functions) and tried other codes but I haven't been able to find a way to get it work.

I'm new to R and would very much appreciate any feedback you may have.


The issue is in the way you specify the by argument. Also we can use keyby instead of by, to do the sorting in one step:

test = function(question, groupA, groupB){
  dt[, sum(get(question) %in% "b") / sum(!is.na(get(question))) * 100, 
    keyby =  c(groupA, groupB)] 

ans = test(question = "Q1", groupA = "grp1", groupB ="grp2")
#   grp1  grp2       V1
# 1:   I     A 55.55556
# 2:   I     B 62.50000
# 3:   I     C 62.50000
# 4:  II     A 62.50000
# 5:  II     B 55.55556
# 6:  II     C 62.50000
# 7: III     A 50.00000
# 8: III     B 62.50000
# 9: III     C 66.66667
# 10:  IV     A 66.66667
# 11:  IV     B 62.50000
# 12:  IV     C 50.00000


11-03 12:27