本文介绍了R - 将固定列传递到data.table中的lapply函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table与列 p1 p2 ,...其中包含百分比。我想计算给定一个引用变量 val 的每个列的分位数。在概念上,这是:

I have a data.table with columns p1, p2, ... which contains percentages. I want to compute the quantiles for each columns given a reference variable val. Conceptually, this is like:

quantile(val, p1, type = 4, na.rm = T)
quantile(val, p2, type = 4, na.rm = T)
...

我尝试使用data.table如下:

My attempt at using data.table is as follows:

fun <- function(x, y) quantile(y, x, type = 4, na.rm = T)
dt[, c('q1', 'q2') := lapply(.SD, fun), .SDcols = c('p1', 'p2'), by = grp]
where grp is some grouping variable

我有麻烦指定 y 变量以保持其固定的方式。

However, I am having trouble specifying the y variable in a way that keeps it fixed.

我尝试以下: p>

I tried the following:

fun <- function(x, y, dt) quantile(dt[, y], x, type = 4, na.rm = T)
dt[, c('q1', 'q2') := lapply(.SD, fun, y, dt), .SDcols = c('p1', 'p2'), by = grp]

但是,这种方式不会在计算分位数时强制执行分组。它将基于 y 变量而不是组内的 y 的整个范围来计算分位数。正确的方法是什么?

But doing it this fashion does not enforce the grouping when the quantiles are computed. It will compute the quantile based on the whole range of the y variable instead of the y within groups. What is the correct way to do this?

编辑:

这里只是一个变量的一个简单的例子:

Here is a trivial example of just one variable:

> dt <- data.table(y = 1:10, p1 = rep(seq(0.2, 1, 0.2), 2), g = c(rep('a', 5), rep('b', 5)))
> dt
     y  p1 g
 1:  1 0.2 a
 2:  2 0.4 a
 3:  3 0.6 a
 4:  4 0.8 a
 5:  5 1.0 a
 6:  6 0.2 b
 7:  7 0.4 b
 8:  8 0.6 b
 9:  9 0.8 b
10: 10 1.0 b
> fun <- function(x, dt, y) quantile(dt[, y], x, type = 4, na.rm = T)
> dt[, c('q1') := lapply(.SD, fun, dt, y), .SDcols = c('p1'), by = c('g')]
> dt
     y  p1 g q1
 1:  1 0.2 a  2
 2:  2 0.4 a  4
 3:  3 0.6 a  6
 4:  4 0.8 a  8
 5:  5 1.0 a 10
 6:  6 0.2 b  2
 7:  7 0.4 b  4
 8:  8 0.6 b  6
 9:  9 0.8 b  8
10: 10 1.0 b 10

您可以看到q1是使用整个范围 y

You can see q1 is computed using the entire range of y.

推荐答案

我认为你会将你需要的百分比存储在同一个data.table中作为你希望的数据以计算非常奇怪的分位数,然而这里是将工作的方法

I find the idea that you would store the percentages you require in the same data.table as the data with which you wish to calculate the quantiles very strange, however here is an approach that will work

dt <- data.table(x=10:1,y = 1:10, p1 = rep(seq(0.2, 1, 0.2), 2), g = c(rep('a', 5), rep('b', 5)))


dt[, c('qx','qy') := Map(f = quantile, x = list(x, y), prob = list(p1), type = 4), by = g]

您可以使用 .SDcols .SD 中选择所需的列

You can use .SDcols on within .SD to select the columns you want

dt[, c('qx','qy') := Map(f = quantile, x = .SD[, .SDcols = c('x','y')], 
                         prob = list(p1), type = 4), by = g]

= FALSE

Or use with =FALSE

dt[, c('qx','qy') := Map(f = quantile, x = .SD[, c('x', 'y'), with = FALSE], 
                          prob = list(p1), type = 4), by = g]

这篇关于R - 将固定列传递到data.table中的lapply函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 09:45