本文介绍了删除每个ID的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 假设我的数据框(mydata)中有三个变量:1)id,2)case和3)value。 mydata< - data.frame(id = c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4) ,case = c(a,b,c,c,b,a,b,c,c,a,b c,c,a,b,c,a),值= c(1,34,56,23,34,546,34,67,23,65,23,65, 23,87,34,321,87)) mydata id个案值 1 1 a 1 2 1 b 34 3 1 c 56 4 1 c 23 5 1 b 34 6 2 a 546 7 2 b 34 8 2 c 67 9 2 c 23 10 3 a 65 11 3 b 23 12 3 c 65 13 3 c 23 14 4 a 87 15 4 b 34 16 4 c 321 17 4 a 87 对于每个ID,我们可以具有类似的case字符,它们的值可以相同或不同。所以基本上,如果他们的价值观是一样的,我只需要保留一个并删除重复。 我的最终数据将是 id案例值 1 1 a 1 2 1 b 34 3 1 c 56 4 1 c 23 5 2 a 546 6 2 b 34 7 2 c 67 8 2 c 23 9 3 a 65 10 3 b 23 11 3 c 65 12 3 c 23 13 4 a 87 14 4 b 34 15 4 c 321 解决方案您可以尝试复制 mydata [!duplicateated(mydata [,c('id','case','value')])]] # id case value #1 1 a 1 #2 1 b 34 #3 1 c 56 #4 1 c 23 #6 2 a 546 #7 2 b 34 #8 2 c 67 #9 2 c 23 #10 3 a 65 #11 3 b 23 #12 3c 65 #13 3 c 23 #14 4 a 87 #15 4 b 34 #16 4 c 321 / pre> 或使用选项从 data.table library(data.table) set.seed(25) mydata1 DT< - as.data.table(mydata1) unique(DT,by = c('id','case','value'))#id case value value1 #1:1 a 1 -0.21183360 #2:1 b 34 -1.04159113 #3:1 c 56 -1.15330756 #4:1 c 23 0.32153150 #5:2 a 546 -0.44553326 #6:2 b 34 1.73404543 # 7:2 c 67 0.51129562 #8:2 c 23 0.09964504 #9:3 a 65 -0.05789111 #10:3 b 23 -1.74278763 #11:3 c 65 -1.32495298 #12:3 c 23 -0.54793388 #13:4 a 87 -1.45638428 #14:4 b 34 0.08268682 #15:4 c 32 1 0.92757895 Suppose that there are three variables in my data frame (mydata): 1) id, 2) case, and 3) value.mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","b","a","b","c","c","a","b","c","c","a","b","c","a"), value=c(1,34,56,23,34,546,34,67,23,65,23,65,23,87,34,321,87))mydata id case value1 1 a 12 1 b 343 1 c 564 1 c 235 1 b 346 2 a 5467 2 b 348 2 c 679 2 c 2310 3 a 6511 3 b 2312 3 c 6513 3 c 2314 4 a 8715 4 b 3416 4 c 32117 4 a 87For each id, we could have similar ‘case’ characters, and their values could be the same or different. So basically, if their values are the same, I only need to keep one and remove the duplicate.My final data then would be id case value1 1 a 12 1 b 343 1 c 564 1 c 235 2 a 5466 2 b 347 2 c 678 2 c 239 3 a 6510 3 b 2311 3 c 6512 3 c 2313 4 a 8714 4 b 3415 4 c 321 解决方案 You could try duplicated mydata[!duplicated(mydata[,c('id', 'case', 'value')]),] # id case value #1 1 a 1 #2 1 b 34 #3 1 c 56 #4 1 c 23 #6 2 a 546 #7 2 b 34 #8 2 c 67 #9 2 c 23 #10 3 a 65 #11 3 b 23 #12 3 c 65 #13 3 c 23 #14 4 a 87 #15 4 b 34 #16 4 c 321Or use unique with by option from data.table library(data.table) set.seed(25) mydata1 <- cbind(mydata, value1=rnorm(17)) DT <- as.data.table(mydata1) unique(DT, by=c('id', 'case', 'value')) # id case value value1 #1: 1 a 1 -0.21183360 #2: 1 b 34 -1.04159113 #3: 1 c 56 -1.15330756 #4: 1 c 23 0.32153150 #5: 2 a 546 -0.44553326 #6: 2 b 34 1.73404543 #7: 2 c 67 0.51129562 #8: 2 c 23 0.09964504 #9: 3 a 65 -0.05789111 #10: 3 b 23 -1.74278763 #11: 3 c 65 -1.32495298 #12: 3 c 23 -0.54793388 #13: 4 a 87 -1.45638428 #14: 4 b 34 0.08268682 #15: 4 c 321 0.92757895 这篇关于删除每个ID的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-27 16:16