本文介绍了在 R 中构建基于用户的协同过滤推荐系统的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个有 129539 行和 530 列的矩阵.第一列对应于 ClientID,第一行对应于产品品牌.在里面,我有一个每个 ClientID 对每个产品品牌的排名索引(如果 ClientID 从未购买过该产品,则为 0,否则一直到 10).

I have a matrix with 129539 rows and 530 columns. The first column correspond to ClientIDs and the first row to product brands. Inside I have a ranking index that each ClientID has for every product brand (0 if the ClientID never bought the product, all the way up to 10 otherwise).

我正在 R 中构建一个基于用户的协作过滤推荐系统,使用前 5000 行进行训练,它给了我一个对我来说没有意义的输出.

I am building a User Based Collaborative Filtering Recommender System in R, using the first 5000 rows for training, and it gives me an output that doesn't make sense to me.

我必须生成它的代码如下:

The code I have to generate it is the following:

# 加载到预先计算好的亲和度数据
affinity.data <-read.csv("mydirectory")
affinity.matrix <- as(affinity.data,"realRatingMatrix")

# 创建模型 - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model

# 为用户 1507323 推荐的前 5 项
recommended.items.1507323 # 显示它们
as(recommended.items.1507323, "list")

我得到的输出是一个值列表,例如:
[[1]][1] "0.0061652281134402" "0.00661813368630046" "0.0119331742243437" "0.0136147038801906"[5] "0.0138312586445367"

The output I'm getting is a list of values such as:
[[1]][1] "0.0061652281134402" "0.00661813368630046" "0.0119331742243437" "0.0136147038801906"[5] "0.0138312586445367"

我期待的是我尝试推荐的品牌名称,而不是数字列表.PS:我的原始矩阵的值从 0 到 10(包括小数,不仅仅是整数)

I was expecting the names of the brands that I am trying to recommend, not a list of numbers.PS: my original matrix has values from 0 to 10 (decimals included, not only integers)

非常感谢您提供的任何帮助或说明.

Thank you very much for any help or clarification you may have.

推荐答案

这里有几个问题:首先,predict() 函数将为您选择的用户返回每个项目的预测评分.如果您想推荐 Top N 列表,您必须预测该用户对每个项目的评分,然后对评分进行排序并返回前 N 个.

There are a couple of issues here: first, the predict() function will return the predicted rating for each item for the user you chose. If you want to recommend a Top N list, you'll have to predict the rating for every item for that user, then sort the ratings and return the top N.

第二,当用户和项目从未交互时,推荐系统通常使用 NULL 或 NA 或缺失数据.您已经为此使用了 0.这意味着预测将严重偏向 0(假设大多数用户不与大多数项目交互),并且您的预测实际上是在说明用户甚至与项目交互的概率.这可能是功能或错误,具体取决于您的用例.但是,如果您的评分 1-10 代表偏好,而 0 代表使用/未使用的二进制,那么您正在混合信息,您应该将 0 替换为 NA.

Second, recommender systems normally use NULL or NA or missing data when a user and item have never interacted. You've used 0 for this. That means that the predictions are going to be heavily skewed toward 0 (given that most users don't interact with most items) and that your predictions are actually saying the probability that a user will even interact with an item. This may be a feature or a bug, depending on your use case. But if your ratings 1-10 represent preferences, and 0 represents a binary used/not used, then you're mixing information and you should replace 0 with NA.

这篇关于在 R 中构建基于用户的协同过滤推荐系统的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-27 14:20