R在非常大的稀疏矩阵上的k均值聚类?

本文介绍了R在非常大的稀疏矩阵上的k均值聚类?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在非常大的矩阵上进行一些k均值聚类.

I am trying to do some k-means clustering on a very large matrix.

矩阵大约为500000行x 4000列，但非常稀疏(每行只有几个"1"值).

The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row).

整个内容都无法放入内存，因此我将其转换为稀疏的ARFF文件.但是R显然无法读取稀疏的ARFF文件格式.我也将数据保存为纯CSV文件.

The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have the data as a plain CSV file.

R中是否有任何软件包可以有效地加载这种稀疏矩阵?然后，我将使用群集程序包中的常规k均值算法进行操作.

Is there any package available in R for loading such sparse matrices efficiently? I'd then use the regular k-means algorithm from the cluster package to proceed.

非常感谢

the

R在非常大的稀疏矩阵上的k均值聚类?

问题描述

推荐答案