问题描述
我正在尝试在非常大的矩阵上进行一些k均值聚类.
I am trying to do some k-means clustering on a very large matrix.
矩阵大约为500000行x 4000列,但非常稀疏(每行只有几个"1"值).
The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row).
整个内容都无法放入内存,因此我将其转换为稀疏的ARFF文件.但是R显然无法读取稀疏的ARFF文件格式.我也将数据保存为纯CSV文件.
The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have the data as a plain CSV file.
R中是否有任何软件包可以有效地加载这种稀疏矩阵?然后,我将使用群集程序包中的常规k均值算法进行操作.
Is there any package available in R for loading such sparse matrices efficiently? I'd then use the regular k-means algorithm from the cluster package to proceed.
非常感谢
推荐答案
bigmemory 软件包(或现在的软件包家族,请参见其网站)使用k-means作为大型扩展分析的运行示例数据.特别请参见包含k-means函数的子软件包 biganalytics .
The bigmemory package (or now family of packages -- see their website) used k-means as running example of extended analytics on large data. See in particular the sub-package biganalytics which contains the k-means function.
这篇关于R在非常大的稀疏矩阵上的k均值聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!