问题描述
还有一个apply
问题.
我已经阅读了很多有关R中apply
函数族的文档(并且在我的工作中经常使用它们).我定义了一个函数myfun
,在该函数下我想将其应用于数据框inc
的每一行.我想我需要一些apply(inc,1,myfun)
的变体,我已经使用它一段时间了,但是仍然不能完全理解它.我已经包含了一个循环,该循环可以完全实现我想做的事情……它对我的真实数据而言非常慢且效率低下,比我在此处包含的样本数据要大得多.
I've reviewed a lot of documentation on the apply
family of functions in R (and use them quite a bit in my work). I've defined a function myfun
below which I want to apply to every row of the dataframe inc
. I think I need some variant of apply(inc,1,myfun)
I've played around with it for a while, but still can't quite get it. I've included a loop which achieves exactly what I want to do... it's just super slow and inefficient on my real data which is considerably larger than the sample data I've included here.
我希望这是一个快速解决方案,但是我不能完全依靠它……也许有些带有特殊参数...
的东西可以适用?
I expect it's a quick fix, but I can't quite put my finger on it... maybe something with special argument ...
to apply?
下面的代码的英文版:我想查看inc
数据框中的所有提交日期,并为每个这些日期查找chg
中有多少行,其中chg$Submit.Date
在其中inc$Submit.Date
的某个范围.范围由myfun
English version of what the code below does: I want to look at all the Submit Dates in the inc
dataframe and find for each of these dates, how many rows in chg
there are where chg$Submit.Date
is within some range of the inc$Submit.Date
. Where the range is controlled by fdays
and bdays
in myfun
chgdf <- data.frame(Submit.Date=as.Date(c("2013-09-27", "2013-09-4", "2013-08-01", "2013-06-24", '2013-05-29', '2013-08-20')), ID=c('001', '001', '001', '001', '001', '005'), stringsAsFactors=F)
incdf <- data.frame(Submit.Date=as.Date(c("2013-10-19", "2013-09-14", "2013-08-22", '2013-08-20')), ID=c('001', '001', '002', '006'), stringsAsFactors=F)
我想应用于数据帧inc的每一行的功能
myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
fdays <- tdate+fdays
bdays <- tdate-bdays
chg2 <- chg[chg$ID==aid & chg$Submit.Date<fdays & chg$Submit.Date>bdays, ]
ret <- nrow(chg2)
return(ret)
}
适用于一行inc数据帧
tdate <- inc[inc$ID==aid, 'Submit.Date'][1]
myfun(tdate, aid='001', bdays=50, fdays=100)
正常运行,但速度慢...具有完整数据集
inc$chgw <- 0
for(i in 1:nrow(inc)){
aid <- inc$ID[i]
tdate <- inc$Submit.Date[i]
inc$chgw[i] <- myfun(tdate, aid, bdays=50, fdays=100)
}
推荐答案
首先,当您调用apply
时,所有值都被强制转换为字符串,因此需要在使用tdate
之前对其进行转换.否则,您将尝试在字符串中添加天数:
First, when you call apply
all values are coerced to strings, so you need to convert tdate
before using it. Otherwise you're trying to add days to a string:
tdate <- as.Date(tdate)
fdays <- tdate+fdays
bdays <- tdate-bdays
第二,您呼叫apply(inc, 1, myfun)
.请注意,在这种情况下,您要将单个参数传递给myfun
(整行),而不会像myfun
那样接收多个参数.
Second, you call apply(inc, 1, myfun)
. Note that in that case you're passing a single parameter to myfun
(the whole row), and not several parameters as myfun
is supposed to receive.
解决方案1::更改您的功能以接收整行数据框并像以前那样调用:
Solution 1: Change your function to receive a whole row of the dataframe and call as you did:
myfun <- function(row, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
tdate <- as.Date(row[1])
fdays <- tdate+fdays
bdays <- tdate-bdays
chgdf2 <- chgdf[chgdf$ID==row[2] & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
ret <- nrow(chgdf2)
return(ret)
}
> apply(inc, 1, myfun)
[1] 1 2 0 0
解决方案2:使用函数调用中的所有参数调用apply
:
Solution 2: Call apply
using all parameters in the function call:
myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
fdays <- tdate+fdays
bdays <- tdate-bdays
chgdf2 <- chgdf[chgdf$ID==aid & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
ret <- nrow(chgdf2)
return(ret)
}
> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2]))
[1] 1 2 0 0
我个人更喜欢第二种解决方案,因为它使您可以更改myfun
中其他参数的默认值:
I personally prefer the second solution, because it gives you the possibility to change the default values of your other parameters in myfun
:
> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2], bdays=50, fdays=50))
[1] 2 3 0 0
这篇关于通过数据帧R的行对循环进行矢量化处理,同时访问数据帧的多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!