本文介绍了如何使用R来检查数据一致性(确保大小写和值之间没有矛盾)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说我有

Person   Movie    Rating
Sally    Titanic  4
Bill     Titanic  4
Rob      Titanic  4
Sue      Cars     8
Alex     Cars     **9**
Bob      Cars     8

如您所见,Alex有矛盾之处。所有相同的电影应该具有相同的排名,但是Alex出现了数据错误条目。如何使用R解决这个问题?我已经考虑了好一阵子,但无法弄清楚。我是否只需要在excel中手动完成操作? R上是否有一条命令会返回两列之间存在数据矛盾的所有情况?

As you can see, there is a contradiction for Alex. All the same movies should have the same ranking, but there was a data error entry for Alex. How can I use R to solve this? I've been thinking about it for a while, but I can't figure it out. Do I have to just do it manually in excel or something? Is there a command on R that will return all the cases where there are data contradictions between two columns?

也许我可以让R对所有的Movie案例进行布尔检查。匹配其第一次迭代的第一等级?对于所有返回否的信息,我可以手动进行查看吗?我将如何编写此函数?

Perhaps I could have R do a boolean check if all the Movie cases match the first rating of its first iteration? For all that returns "no," I can go look at it manually? How would I write this function?

谢谢

推荐答案

data.table 解决方案

定义函数

Myfunc <- function(x) {
  temp <- table(x)  
  names(temp)[which.max(temp)]
}

library(data.table)

使用正确的列创建一列评分(按引用)

Create a column with the correct rating (by reference)

setDT(df)[, CorrectRating := Myfunc(Rating), Movie][]
#    Person   Movie Rating CorrectRating
# 1:  Sally Titanic      4             4
# 2:   Bill Titanic      4             4
# 3:    Rob Titanic      4             4
# 4:    Sue    Cars      8             8
# 5:   Alex    Cars      9             8
# 6:    Bob    Cars      8             8

或者如果您要删除不良评级

Or If you want to remove the "bad" ratings

df[Rating == CorrectRating][]
#    Person   Movie Rating CorrectRating
# 1:  Sally Titanic      4             4
# 2:   Bill Titanic      4             4
# 3:    Rob Titanic      4             4
# 4:    Sue    Cars      8             8
# 5:    Bob    Cars      8             8

这篇关于如何使用R来检查数据一致性(确保大小写和值之间没有矛盾)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 15:53