本文介绍了通过数据帧R的行对循环进行矢量化处理,同时访问数据帧的多个变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

还有一个apply问题.

我已经阅读了很多有关R中apply函数族的文档(并且在我的工作中经常使用它们).我定义了一个函数myfun,在该函数下我想将其应用于数据框inc的每一行.我想我需要一些apply(inc,1,myfun)的变体,我已经使用它一段时间了,但是仍然不能完全理解它.我已经包含了一个循环,该循环可以完全实现我想做的事情……它对我的真实数据而言非常慢且效率低下,比我在此处包含的样本数据要大得多.

I've reviewed a lot of documentation on the apply family of functions in R (and use them quite a bit in my work). I've defined a function myfun below which I want to apply to every row of the dataframe inc. I think I need some variant of apply(inc,1,myfun) I've played around with it for a while, but still can't quite get it. I've included a loop which achieves exactly what I want to do... it's just super slow and inefficient on my real data which is considerably larger than the sample data I've included here.

我希望这是一个快速解决方案,但是我不能完全依靠它……也许有些带有特殊参数...的东西可以适用?

I expect it's a quick fix, but I can't quite put my finger on it... maybe something with special argument ... to apply?

下面的代码的英文版:我想查看inc数据框中的所有提交日期,并为每个这些日期查找chg中有多少行,其中chg$Submit.Date在其中inc$Submit.Date的某个范围.范围由myfun

English version of what the code below does: I want to look at all the Submit Dates in the inc dataframe and find for each of these dates, how many rows in chg there are where chg$Submit.Date is within some range of the inc$Submit.Date. Where the range is controlled by fdays and bdays in myfun

chgdf <- data.frame(Submit.Date=as.Date(c("2013-09-27", "2013-09-4", "2013-08-01", "2013-06-24", '2013-05-29', '2013-08-20')), ID=c('001', '001', '001', '001', '001', '005'), stringsAsFactors=F)
incdf <- data.frame(Submit.Date=as.Date(c("2013-10-19", "2013-09-14", "2013-08-22", '2013-08-20')), ID=c('001', '001', '002', '006'), stringsAsFactors=F)

我想应用于数据帧inc的每一行的功能

myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
  fdays <- tdate+fdays
  bdays <- tdate-bdays
  chg2 <- chg[chg$ID==aid & chg$Submit.Date<fdays & chg$Submit.Date>bdays, ]
  ret <- nrow(chg2)
  return(ret)
}

适用于一行inc数据帧

tdate <- inc[inc$ID==aid, 'Submit.Date'][1]
myfun(tdate, aid='001', bdays=50, fdays=100)

正常运行,但速度慢...具有完整数据集

inc$chgw <- 0
for(i in 1:nrow(inc)){
  aid <- inc$ID[i]
  tdate <- inc$Submit.Date[i]
  inc$chgw[i] <- myfun(tdate, aid, bdays=50, fdays=100)
}

推荐答案

首先,当您调用apply时,所有值都被强制转换为字符串,因此需要在使用tdate之前对其进行转换.否则,您将尝试在字符串中添加天数:

First, when you call apply all values are coerced to strings, so you need to convert tdate before using it. Otherwise you're trying to add days to a string:

tdate <- as.Date(tdate)
fdays <- tdate+fdays
bdays <- tdate-bdays

第二,您呼叫apply(inc, 1, myfun).请注意,在这种情况下,您要将单个参数传递给myfun(整行),而不会像myfun那样接收多个参数.

Second, you call apply(inc, 1, myfun). Note that in that case you're passing a single parameter to myfun (the whole row), and not several parameters as myfun is supposed to receive.

解决方案1::更改您的功能以接收整行数据框并像以前那样调用:

Solution 1: Change your function to receive a whole row of the dataframe and call as you did:

myfun <- function(row, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
  tdate <- as.Date(row[1])
  fdays <- tdate+fdays
  bdays <- tdate-bdays
  chgdf2 <- chgdf[chgdf$ID==row[2] & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
  ret <- nrow(chgdf2)
  return(ret)
}
> apply(inc, 1, myfun)
[1] 1 2 0 0

解决方案2:使用函数调用中的所有参数调用apply:

Solution 2: Call apply using all parameters in the function call:

myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
  fdays <- tdate+fdays
  bdays <- tdate-bdays
  chgdf2 <- chgdf[chgdf$ID==aid & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
  ret <- nrow(chgdf2)
  return(ret)
}
> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2]))
[1] 1 2 0 0

我个人更喜欢第二种解决方案,因为它使您可以更改myfun中其他参数的默认值:

I personally prefer the second solution, because it gives you the possibility to change the default values of your other parameters in myfun:

> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2], bdays=50, fdays=50))
[1] 2 3 0 0

这篇关于通过数据帧R的行对循环进行矢量化处理,同时访问数据帧的多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-17 01:03