我已经提到:

  • How to add a factor column to dataframe based on a conditional statement from another column? ,
  • How to add column into a dataframe based on condition in R programming
  • R: Add column with condition-check on three columns?

  • 所有示例均基于对其他列中的数字向量或 NA 的测试并添加新变量。这是一个简短的可重现示例:
    x <- c("dec 12", "jan 13", "feb 13", "march 13", "apr 13", "may 13",
           "june 13", "july 13", "aug 13", "sep 13", "oct 13", "nov 13")
    y <- c(234, 678, 534, 122, 179, 987, 872, 730, 295, 450, 590, 312)
    df<-data.frame(x,y)
    

    我想为 df$x = dec | 添加“冬天”简 |二月,三月的“ Spring ”|四月|五月,“夏天”和“秋天”。

    我试过了
    df$season <- ifelse(df[1:3, ], "winter", ifelse(df[4:6, ], "spring",
                        ifelse(df[7:9, ], "summer", "autumn")))
    

    我知道这是一种非常低效的做事方式,但我是一个新手和一个笨手笨脚的人。它返回了错误:
    Error in ifelse(df[1:3, ], "winter", ifelse(df[4:6, ], "spring",
    ifelse(df[7:9,  : (list) object cannot be coerced to type 'logical'
    

    如果同一个数据框有数千行,我想遍历它并根据一年中的月份为季节创建一个新变量,我该怎么做?我提到:“Looping through a data frame to add a column depending variables in other columns ”,但这是循环并设置数学运算符以创建新变量。我尝试了外部资源: a thread on the R mailing lista thread on the TalkStats forum 。然而,两者都基于数字变量和条件。

    最佳答案

    如果您有一个非常大的数据框,那么 data.table 将对您非常有帮助。以下工作:

    library(data.table)
    x <- c("dec 12", "jan 13", "feb 13", "march 13", "apr 13", "may 13",
       "june 13", "july 13", "aug 13", "sep 13", "oct 13", "nov 13")
    y <- c(234, 678, 534, 122, 179, 987, 872, 730, 295, 450, 590, 312)
    df <-data.frame(x,y)
    DT <- data.table(df)
    DT[, month := substr(tolower(x), 1, 3)]
    DT[, season := ifelse(month %in% c("dec", "jan", "feb"), "winter",
                   ifelse(month %in% c("mar", "apr", "may"), "spring",
                   ifelse(month %in% c("jun", "jul", "aug"), "summer",
                   ifelse(month %in% c("sep", "oct", "nov"), "autumn", NA))))]
    DT
              x   y month season
    1:   dec 12 234   dec winter
    2:   jan 13 678   jan winter
    3:   feb 13 534   feb winter
    4: march 13 122   mar spring
    5:   apr 13 179   apr spring
    6:   may 13 987   may spring
    7:  june 13 872   jun summer
    8:  july 13 730   jul summer
    9:   aug 13 295   aug summer
    0:   sep 13 450   sep autumn
    1:   oct 13 590   oct autumn
    12:  nov 13 312   nov autumn
    

    关于r - 将列添加到数据框中,在另一列中测试分类变量,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/22125224/

    10-15 11:47