问题描述
我有一个data.table与列 p1
, p2
,...其中包含百分比。我想计算给定一个引用变量 val
的每个列的分位数。在概念上,这是:
I have a data.table with columns p1
, p2
, ... which contains percentages. I want to compute the quantiles for each columns given a reference variable val
. Conceptually, this is like:
quantile(val, p1, type = 4, na.rm = T)
quantile(val, p2, type = 4, na.rm = T)
...
我尝试使用data.table如下:
My attempt at using data.table is as follows:
fun <- function(x, y) quantile(y, x, type = 4, na.rm = T)
dt[, c('q1', 'q2') := lapply(.SD, fun), .SDcols = c('p1', 'p2'), by = grp]
where grp is some grouping variable
我有麻烦指定 y
变量以保持其固定的方式。
However, I am having trouble specifying the y
variable in a way that keeps it fixed.
我尝试以下: p>
I tried the following:
fun <- function(x, y, dt) quantile(dt[, y], x, type = 4, na.rm = T)
dt[, c('q1', 'q2') := lapply(.SD, fun, y, dt), .SDcols = c('p1', 'p2'), by = grp]
但是,这种方式不会在计算分位数时强制执行分组。它将基于 y
变量而不是组内的 y
的整个范围来计算分位数。正确的方法是什么?
But doing it this fashion does not enforce the grouping when the quantiles are computed. It will compute the quantile based on the whole range of the y
variable instead of the y
within groups. What is the correct way to do this?
编辑:
这里只是一个变量的一个简单的例子:
Here is a trivial example of just one variable:
> dt <- data.table(y = 1:10, p1 = rep(seq(0.2, 1, 0.2), 2), g = c(rep('a', 5), rep('b', 5)))
> dt
y p1 g
1: 1 0.2 a
2: 2 0.4 a
3: 3 0.6 a
4: 4 0.8 a
5: 5 1.0 a
6: 6 0.2 b
7: 7 0.4 b
8: 8 0.6 b
9: 9 0.8 b
10: 10 1.0 b
> fun <- function(x, dt, y) quantile(dt[, y], x, type = 4, na.rm = T)
> dt[, c('q1') := lapply(.SD, fun, dt, y), .SDcols = c('p1'), by = c('g')]
> dt
y p1 g q1
1: 1 0.2 a 2
2: 2 0.4 a 4
3: 3 0.6 a 6
4: 4 0.8 a 8
5: 5 1.0 a 10
6: 6 0.2 b 2
7: 7 0.4 b 4
8: 8 0.6 b 6
9: 9 0.8 b 8
10: 10 1.0 b 10
您可以看到q1是使用整个范围 y
。
You can see q1 is computed using the entire range of y
.
推荐答案
我认为你会将你需要的百分比存储在同一个data.table中作为你希望的数据以计算非常奇怪的分位数,然而这里是将工作的方法
I find the idea that you would store the percentages you require in the same data.table as the data with which you wish to calculate the quantiles very strange, however here is an approach that will work
dt <- data.table(x=10:1,y = 1:10, p1 = rep(seq(0.2, 1, 0.2), 2), g = c(rep('a', 5), rep('b', 5)))
dt[, c('qx','qy') := Map(f = quantile, x = list(x, y), prob = list(p1), type = 4), by = g]
您可以使用 .SDcols
在 .SD
中选择所需的列
You can use .SDcols
on within .SD
to select the columns you want
dt[, c('qx','qy') := Map(f = quantile, x = .SD[, .SDcols = c('x','y')],
prob = list(p1), type = 4), by = g]
= FALSE
Or use with =FALSE
dt[, c('qx','qy') := Map(f = quantile, x = .SD[, c('x', 'y'), with = FALSE],
prob = list(p1), type = 4), by = g]
这篇关于R - 将固定列传递到data.table中的lapply函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!