本文介绍了如果用户定义函数中的语句在 R 中应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果这是一个明显的问题,请原谅我,我是一个渴望学习的初学者 R 用户.

Forgive me if this is a blatantly obvious question, I am a beginner R user eager to learn.

我有一个 4 列的数据框,其中大约 150 万行包含坐标信息,其中每一行代表一个特定位置.我想要做的是将这些数据运行到一个函数中,该函数包含一系列 if else 语句,这些语句确定较大框内特定位置的区域.例如,一个点可以在中心、沿着盒子边缘 1.5 英寸以内、在盒子的内侧但不在边缘或在中心、或在盒子的外侧.

I have a data frame of 4 columns with roughly 1.5 million rows containing coordinate information where each individual row represents a specific location. What I would like to do is run these data into a function that holds a series of if else statements that determine the area of the specific location within a larger box. For example, a point can be in the center, along the edge of the box within 1.5 inches, on the inside of the box but not on the edge nor at the center, or on the outside of the box.

每个 if 语句确定一组点是否在指定区域内,如果是,则结果是 if 语句在另一个数据框的相应行中放置一个1".

Each if statement determines if a set of points is in a specified area, and, if it is, the result is the if statement putting a '1' in the corresponding row of another data frame.

这是我正在尝试做的事情的可视化:

Here is a visualization of what I am trying to do:

从名为维度"的数据框中获取此位置数据:

Take this location data from a data frame called 'dimensions':

 sz_top | sz_bot |     px |   pz  |
  3.526 |   1.615|  -1.165| 3.748 |

通过这些语句运行它(实际语句要长得多),其中else"条件表示该点完全在框外:

Run it through these statements (the real statements are much longer), where the 'else' condition means the point is outside the box completely:

if(in center) else if(on edge) else if(in box, but not in center or on edge) else

当程序发现哪个条件为真时,它会在对应列(这些列是第 50-53 列)的另一个名为call"的数据框中放置一个 1.如果代码发现点在中心,这就是该行的样子:

When the program finds which condition is true, it puts a 1 in ANOTHER data frame called 'call' in the corresponding column (these columns are columns 50-53). This is what the row would look like in the event the code found the point was in the center:

center| edge| other_in| out|
  1   |  0  |       0 |   0|

需要注意的可以提高效率的一件事是,坐标实际上也包含在第 22、23、26 和 27 列的调用"数据框中,但我将它们移到了维度",因为这样更容易我一起工作.这绝对可以改变.

One thing to note that could improve efficiency is that the coordinates are actually also contained in the 'calls' data frame in columns 22,23,26, and 27, but I moved them to 'dimensions' because it was easier for me to work with. This can definitely be changed.

我现在非常不清楚如何从这里开始.我已经写了所有的 if else 语句,但我不清楚我的程序如何知道它在哪一行,以便用测试结果正确标记相应的行.

I am now very unclear on how to proceed from here. I have all my if else statement written, but I am unclear on how my program will know which row it is on as to correctly mark the corresponding row with the result of the tests.

如果您想从我这里获得更多信息,请告诉我.

Please let me know if you would like any more information from me.

谢谢!

以下是维度"数据框的示例:

Here is a sample of the 'dimensions' data frame:

sz_top  sz_bot  px  pz
1   3.526   1.615   -1.165  3.748
2   3.29    1.647   -0.412  1.9
3   3.29    1.647   -1.213  1.352
4   3.565   1.75    -1.041  2.419
5   3.565   1.75    -0.357  1.776
6   3.565   1.75    0.838   0.834
7   3.541   1.724   -1.619  3.661
8   3.541   1.724   -2.498  2.421
9   3.541   1.724   -1.673  2.348
10  3.541   1.724   -1.572  2.982
11  3.305   1.5 -1.316  2.842

这是我的一个 if 语句的示例.其他的非常相似,只是查看了相关盒子周围的不同位置:

Here is an example of one of my if statements. The others are fairly similar, just looking at different locations around the box in question:

  if(
    ((as.numeric(as.character(dimensions$px))*12)>= -3)
    &&
      ((as.numeric(as.character(dimensions$px))*12)<= 3)
    &&
      ((as.numeric(as.character(dimensions$pz))*12)<=((as.numeric(as.character(dimensions$sz_top))*12-as.numeric(as.character(dimensions$sz_bot))*12)/2)+(as.numeric(as.character(dimensions$sz_bot))*12)+3)
    &&
      ((as.numeric(as.character(dimensions$pz))*12)>=((as.numeric(as.character(dimensions$sz_top))*12-as.numeric(as.character(dimensions$sz_bot))*12)/2)+(as.numeric(as.character(dimensions$sz_bot))*12)-3)
  ){return(1)
  }

推荐答案

如果我理解正确,以下内容将返回一个由 1 和 0 组成的数字向量,您可以将其插入到 calls 的适当列中.

If I understand correctly, the following will return a numeric vector of ones and zeros that you can slot into the appropriate column of calls.

dimensions <- read.table(text='sz_top  sz_bot  px  pz
1   3.526   1.615   -1.165  3.748
2   3.29    1.647   -0.412  1.9
3   3.29    1.647   -1.213  1.352
4   3.565   1.75    -1.041  2.419
5   3.565   1.75    -0.357  1.776
6   3.565   1.75    0.838   0.834
7   3.541   1.724   -1.619  3.661
8   3.541   1.724   -2.498  2.421
9   3.541   1.724   -1.673  2.348
10  3.541   1.724   -1.572  2.982
11  3.305   1.5 -1.316  2.842', header=T, row.names=1)


as.numeric(
  dimensions$px*12 >= -3
  & dimensions$px*12 <= 3
  & dimensions$pz*12 <=
    (dimensions$sz_top*12 - dimensions$sz_bot*12)/2 + (dimensions$sz_bot*12) + 3
  & dimensions$pz*12 >=
    (dimensions$sz_top*12 - dimensions$sz_bot*12)/2 + (dimensions$sz_bot*12) - 3)

通过使用单个 & 符号,R 为 data.frame 的每一行计算条件表达式,而不是在第一次不满足条件时停止.

By using single ampersands, R evaluates the conditional expression for each row of the data.frame, rather than stopping when the condition is first not met.

为了清楚起见,我已经删除了 as.numericas.character(不知道为什么这些是必要的......这些数据是作为因子读入的吗?如果所以,也许试试 stringsAsFactors = FALSE).

I've removed as.numeric and as.character for clarity (not sure why these are necessary anyway... were these data read in as factors? If so, perhaps try stringsAsFactors = FALSE).

这篇关于如果用户定义函数中的语句在 R 中应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 17:55