本文介绍了在R中编写一个函数,以按频率对因子水平进行分组,然后保留2个最大的类别,并将其余的类别合并到“其他"类别中.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在R中编写一个函数,该函数将单个因子变量和参数n作为输入,计算因子变量中每个类别的病例数,并仅保留具有最多病例数的n个类别,将所有其他类别归入其他"类别.此函数必须应用于多个变量,每个变量保留2个最大的类别,并将每个变量中的所有其他类别合并为一个其他"类别.

I would like to write a function in R that takes a single factor variable and a parameter n as inputs, computes the number of cases per category in the factor variable, and only keeps those n categories with the most number of cases and pools all other categories into a category "other." This function must be applied to multiple variables, keeping the 2 largest categories for each variable and pooling all other categories in each variable into a category "other."

示例:

var1 <- c("square", "square", "square", "circle", "square", "square", "circle",
"square", "circle", "circle", "circle", "circle", "square", "circle", "triangle", "circle", "circle", "rectangle")

var2 <- c("orange", "orange", "orange", "orange", "blue", "orange", "blue",
"blue", "orange", "blue", "blue", "blue", "orange", "orange", "orange", "orange", "green", "purple")

df <- data.frame(var1, var2)

非常感谢您!

推荐答案

forcats::fct_lump_n()为此存在:

library(forcats)
library(dplyr)

df %>%
  mutate_all(fct_lump_n, 2)

     var1   var2
1  square orange
2  square orange
3  square orange
4  circle orange
5  square   blue
6  square orange
7  circle   blue
8  square   blue
9  circle orange
10 circle   blue
11 circle   blue
12 circle   blue
13 square orange
14 circle orange
15  Other orange
16 circle orange
17 circle  Other
18  Other  Other

这篇关于在R中编写一个函数,以按频率对因子水平进行分组,然后保留2个最大的类别,并将其余的类别合并到“其他"类别中.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 11:17