本文介绍了R-在这种情况下,如何通过应用族功能进行向量化并避免while/for循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这种情况下(可以在此问题中找到更多详细信息:)

In this case (more details could be found in this question: Count how many observations in the rest of the dat fits multiple conditions? (R))

我编写了一个带有while循环的用户定义函数-为了简化操作,我仅在30天内以相同状态包含了2个条件:

I have written a user-defined function with a while loop - to make this easier, I only included 2 conditions, within 30 days and in the same state:

n = 1

f = function(i) {
  a = i[n,]
  b = a$date
  # c = a$LON
  # d = a$LAT
  e = a$STATEid
  f = a$RID
  g1 = sum(i$CASE  [i$date<= b+30 & i$date>b & i$STATEid==e], na.rm=T)
  # g2 = sum(i$viold [i$date<= b+30 & i$date>b], na.rm=T)
  # g3 = sum(i$CASE  [i$date<= b+60 & i$date>b], na.rm=T)
  # g4 = sum(i$viold [i$date<= b+60 & i$date>b], na.rm=T)
  # h = cbind(g1, g2, g3, g4)
  g1 = data.frame(g1)
  n = n+1
  assign(as.character(f), g1, envir = .GlobalEnv)
}

for(n in 1:20)(f(event2))

for (n in 1:20) (f(event2))

这已经花费了很长时间,因为它包含23,000个案例.当循环只需要运行两次时,我的16GB RAM的PC无法钉牢它!因此,我认为避免循环会更可取.您能否建议一种向量化我的代码并避免循环的方法?

This has been taking too long since it contains 23,000 cases. My PC with a 16GB Ram cannot nail it when the loop only needs to run twice! So I think avoiding loops would be preferable. Could you please suggest a way to vectorize my codes and avoid the looping?

我的主要问题是,当我需要引用每行时,当需要多个条件时,我不知道如何编写用户定义的问题;这就是为什么在循环函数中创建诸如"a","b","c","d","e"来正确地称呼他们……效率低下-我知道...

My main problem is that I don't know how to write user-defined problems when I need to refer to each row, each variable properly when multiple conditions are required - that's why in my loop function, I created objects such as "a", "b", "c", "d", "e" to call them properly... Inefficient - I know...

我的dput结果看起来像这样:

My dput outcome looks like so:

     > dput(tail(event2[,c("RID", "STATEid", "date", "LON", "LAT")]))
structure(list(RID = c("023610", "023611", "023613", "023614", 
"023615", "023616"), STATEid = structure(c(36L, 36L, 23L, 23L, 
5L, 14L), .Label = c("alabama", "alaska", "arizona", "arkansas", 
"california", "colorado", "connecticut", "delaware", "district of columbia", 
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana", 
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland", 
"massachusetts", "michigan", "minnesota", "mississippi", "missouri", 
"montana", "nebraska", "nevada", "new hampshire", "new jersey", 
"new mexico", "new york", "north carolina", "north dakota", "ohio", 
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina", 
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia", 
"washington", "west virginia", "wisconsin", "wyoming"), class = "factor"), 
    date = structure(c(3620, -633, 131, -315, 5421, 3558), class = "Date"), 
    LON = c(-80.6495194, -80.6495194, -83.6129939, -83.6129939, 
    -121.6169108, -87.8328505), LAT = c(41.0997803, 41.0997803, 
    42.2411499, 42.2411499, 39.1404477, 42.4461322)), .Names = c("RID", 
"STATEid", "date", "LON", "LAT"), row.names = c(23610L, 23611L, 
23613L, 23614L, 23615L, 23616L), class = "data.frame")
> 

非常感谢.感谢您的帮助.

Many thanks. I appreciate your help.

最好

---------- 2018年1月20日更新---------

---------- Jan.20, 2018 updates ---------

我创建了一个有效的循环,并正确反映了我的期望:

I have created a loop that works and correctly reflects what I am hoping for:

g = event2[FALSE,]

USERFUN = function(i) {
  a = i[n,] # retrieve each row from the object, make it a data object
  b = a$date # get date
  # c = a$LON # for now I dropped the idea of calculating radius
  # d = a$LAT # for now I dropped the idea of calculating radius
  e = a$STATEid # get STATE
  f = a$RID # get case ID to name the objects generated!

  PostAct30 = sum(i$CASE [i$date<= b+30 & i$date>b & i$STATEid == e], na.rm=T) # multiple conditions defined here - i is the entire dataset 
  PostAct60 = sum(i$CASE [i$date<= b+60 & i$date>b & i$STATEid == e], na.rm=T) # multiple conditions defined here - b, e are dynamic, retrieving from each line!!!
  PreAct30 = sum(i$CASE [i$date<= b & i$date>b-30 & i$STATEid == e], na.rm=T)
  PreAct60 = sum(i$CASE [i$date<= b & i$date>b-30 & i$STATEid == e], na.rm=T)
  PostVio30 = sum(i$viold [i$date<= b+30 & i$date>b & i$STATEid == e], na.rm=T)
  PostVio60 = sum(i$viold [i$date<= b+60 & i$date>b & i$STATEid == e], na.rm=T)
  PreVio30 = sum(i$viold [i$date<= b & i$date>b-30 & i$STATEid == e], na.rm=T)
  PreVio60 = sum(i$viold [i$date<= b & i$date>b-30 & i$STATEid == e], na.rm=T)
  g1 = data.frame(f, PostAct30, PostAct60, PreAct30, PreAct60, PostVio30, PostVio60, PreVio30, PreVio60)
  n = n+1
  return(g1)
  }
# sum(event2$ca)
n = 1
for (n in 1:19446) {
  g2 = USERFUN(event2)
  g = rbind(g, g2)        
}

然后输出看起来像这样:

AND the output looks like so:

> tail(event3 [c("date","STATEid", "PostAct30", "PostAct60", "PostVio30", "PostVio60")])
            date    STATEid PostAct30 PostAct60 PostVio30 PostVio60
23611 1968-04-08       ohio         3         4         0         0
23612       <NA>    arizona        NA        NA        NA        NA
23613 1970-05-12   michigan         4         6         2         4
23614 1969-02-20   michigan         2         3         1         1
23615 1984-11-04 california         4         5         0         0
23616 1979-09-29   illinois         0         2         0         1

推荐答案

考虑mapply通过将 date STATEid 逐元素迭代到定义的位置来添加新列功能.具体来说,mapply生成一个由7列组成的矩阵,您将该矩阵分配给 event2 .

Consider mapply to add new columns in place by iterating date and STATEid elementwise into a defined function. Specifically, mapply produces a matrix of 7 columns which you assign to event2.

dates_calc_fct <- function(b, e) 
  c(sum(event2$CASE [event2$date<= b+30 & event2$date>b & event2$STATEid == e], na.rm=T),
    sum(event2$CASE [event2$date<= b+60 & event2$date>b & event2$STATEid == e], na.rm=T),
    sum(event2$CASE [event2$date<= b & event2$date>b-30 & event2$STATEid == e], na.rm=T),
    sum(event2$CASE [event2$date<= b & event2$date>b-30 & event2$STATEid == e], na.rm=T),
    sum(event2$viold [event2$date<= b+30 & event2$date>b & event2$STATEid == e], na.rm=T),
    sum(event2$viold [event2$date<= b+60 & event2$date>b & event2$STATEid == e], na.rm=T),
    sum(event2$viold [event2$date<= b & event2$date>b-30 & event2$STATEid == e], na.rm=T),
    sum(event2$viold [event2$date<= b & event2$date>b-30 & event2$STATEid == e], na.rm=T)
   )

event2[c("PostAct30", "PostAct60", 
         "PreAct30", "PreAct60",
         "PostVio30", "PostVio60", 
         "PreVio30", "PreVio60")] <- mapply(dates_calc_fct, event$date, event$STATEid)

这篇关于R-在这种情况下,如何通过应用族功能进行向量化并避免while/for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-17 01:12