本文介绍了在 R 中应用涉及两个数据框列的 if else 语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我正在尝试修改一个包含两列的数据框,添加第三个返回四个可能的表达式,具体取决于其他列的内容(即每列是正数还是负数).

I am trying to modify a dataframe of two columns, to add a third that returns four possible expressions depending on the contents of the other columns (i.e. whether each is positive or negative).

我尝试了几种方法,dplyr 中的 'mutate' 函数以及 sapply.不幸的是,我似乎错过了一些东西,因为我收到错误条件有长度".1,并且将仅使用第一个元素".所以只有第一次迭代会应用于新列中的每一行.

I have tried a couple of approaches, the 'mutate' function in dplyr as well as sapply. Unfortunately I seem to be missing something as I get the error "the condition has length > 1 and only the first element will be used". So only the first iteration is applied to each row in the new column.

一个可重现的示例(我尝试过的 mutate 方法)如下:

A reproducible example (of the mutate approach I've tried) is as follows:

Costs <- c(2, -5, -7, 3, 12)
Outcomes <- c(-2, 5, -7, 3, -2)

results <- as.data.frame(cbind(Costs, Outcomes))
results

quadrant <- function(cost,outcome) {
        if (costs < 0 &
            outcomes < 0) {
                "SW Quadrant"
        }
        else if (costs<0 & outcomes>0){
                "Dominant"
        }
        else if (costs>0 & outcomes<0){
                "Dominated"
        }
        else{""}
}


results <- mutate(results,Quadrant = quadrant(Costs,Outcomes)
        )

完整的警告信息是:

警告信息:1:mutate() 输入Quadrant 的问题.i 条件具有长度 >1 并且只使用第一个元素i 输入 Quadrantquadrant(results$Costs, results$Outcomes).2: 在 if (costs 1 并且只使用第一个元素3:mutate() 输入Quadrant 的问题.i 条件具有长度 >1 并且只使用第一个元素i 输入 Quadrantquadrant(results$Costs, results$Outcomes).4: 在 if (costs 0) { :条件具有长度 >1 并且只使用第一个元素5:mutate() 输入Quadrant 的问题.i 条件具有长度 >1 并且只使用第一个元素i 输入 Quadrantquadrant(results$Costs, results$Outcomes).6: 在 if (costs > 0 & results 1 并且只使用第一个元素<

我对 sapply 函数的尝试:

My attempt at the sapply function:

results <- sapply(results$Quadrant,quadrant(results$Costs,results$Outcomes))

导致以下错误,并为 mutate 方法提供一致的警告消息.

Leads to the following error, with consistent warning messages to the mutate approach.

get(as.character(FUN), mode = "function", envir = envir) 中的错误:未找到模式 'function' 的对象 'Dominated'

我确定我在这里遗漏了一些明显的东西.感谢您提供任何建议.

I'm sure I'm missing something obvious here. Grateful for any suggestions.

推荐答案

该函数有两个问题.

  1. 您使用 cost 定义函数,但使用 costs(结果相同);
  2. 您使用 if ,它严格要求长度为 1 的逻辑条件,但有两处错误:您使用了 & ,它几乎永远不会在if 语句,and 你正在传递向量,所以 cost <0 将返回一个与 cost 长度相同的逻辑向量(此处大于 1).
  1. You define the function with cost but use costs (same for outcome);
  2. You use if which strictly requires a logical condition of length 1, and two things wrong: you use & which should almost never be used exposed like this in an if statement, and you are passing vectors, so cost < 0 will return a logical vector the same length of cost (which is greater than 1 here).

建议:

quadrant_sgl <- function(cost, outcome) {
  if (cost < 0 && outcome < 0) return("SW Quadrant")
  if (cost < 0 && outcome > 0) return("Dominant")
  if (cost > 0 && outcome < 0) return("Dominated")
  return("")
}

quadrant_vec1 <- function(cost, outcome) {
  ifelse(cost < 0 & outcome < 0, "SW Quadrant",
         ifelse(cost < 0 & outcome > 0, "Dominant",
                ifelse(cost > 0 & outcome < 0, "Dominated",
                       "")))
}

quadrant_vec2 <- function(cost, outcome) {
  ifelse(cost < 0,
         ifelse(outcome < 0, "SW Quadrant", "Dominant"),
         ifelse(outcome < 0, "Dominated", ""))
}

quadrant_vec3 <- function(cost, outcome) {
  dplyr::case_when(
    cost < 0 & outcome < 0 ~ "SW Quadrant",
    cost < 0 & outcome > 0 ~ "Dominant",
    cost > 0 & outcome < 0 ~ "Dominated",
    TRUE ~ ""
  )
}

quadrant_vec4 <- function(cost, outcome) {
  data.table::fcase(
    cost < 0 & outcome < 0, "SW Quadrant",
    cost < 0 & outcome > 0, "Dominant",
    cost > 0 & outcome < 0, "Dominated",
    rep(TRUE, length(cost)), ""
  )
}

第一个函数 (quadrant_sgl) 将保持单操作(未向量化)的函数转换为向量化函数.如果您不熟悉矢量化的概念,请知道 (1) R 做得很好,(2) R 更喜欢它,以及 (3) 这不是详细讨论这个问题的最佳场所.搜索R向量化"你应该找到很多关于这个的材料.

The first function (quadrant_sgl) turns a function that remains single-operation (not vectorized) into a vectorized function. If you aren't familiar with the concept of vectorization, know that (1) R does it well, (2) R prefers it, and (3) this is not the best venue to talk at length about this. Search for "R vectorization" and you should find plenty of material on this.

因此,第一个只是演示当函数无法(由于时间、编程技巧或其他原因)转换为向量化友好函数时该怎么办.使用 Vectorize.

Because of this, the first one is just a demonstration of what to do when the function cannot (due to time, programming skill, or something else) be converted into a vectorize-friendly function. Use Vectorize.

其他功能都比较等价.

如果你正在使用 dplyr 和朋友,那么我强烈推荐使用 quadrant_vec3,因为它比嵌套的 更易于阅读和维护(IMO)>ifelses.(顺便说一句:如果您必须使用嵌套的 ifelse,那么至少使用嵌套的 dplyr::if_elses,因为它们通常比基本 R 的 ifelse 更安全代码>.)

If you are using dplyr and friends, then I strongly recommend the use of quadrant_vec3, since it is (IMO) much easier to read and maintain than nested ifelses. (BTW: if you must use nested ifelse, then at least use dplyr::if_elses, nested, as they are generally safer than base R's ifelse.)

如果您正在探索 data.table 的世界,那么 quadrant_vec4 相当于使用 data.table 自己的 >fcase 函数,大部分与 case_when 相同.

If you are venturing into the world of data.table, then quadrant_vec4 is the equivalent using data.table's own fcase function, mostly the same as case_when.

演示:

Vectorize(quadrant_sgl, vectorize.args = c("cost", "outcome"))(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"
quadrant_vec1(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"
quadrant_vec2(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"
quadrant_vec3(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"

这篇关于在 R 中应用涉及两个数据框列的 if else 语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-09 01:35