本文介绍了R在非常大的稀疏矩阵上的k均值聚类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在非常大的矩阵上进行一些k均值聚类.

I am trying to do some k-means clustering on a very large matrix.

矩阵大约为500000行x 4000列,但非常稀疏(每行只有几个"1"值).

The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row).

整个内容都无法放入内存,因此我将其转换为稀疏的ARFF文件.但是R显然无法读取稀疏的ARFF文件格式.我也将数据保存为纯CSV文件.

The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have the data as a plain CSV file.

R中是否有任何软件包可以有效地加载这种稀疏矩阵?然后,我将使用群集程序包中的常规k均值算法进行操作.

Is there any package available in R for loading such sparse matrices efficiently? I'd then use the regular k-means algorithm from the cluster package to proceed.

非常感谢

推荐答案

bigmemory 软件包(或现在的软件包家族,请参见其网站)使用k-means作为大型扩展分析的运行示例数据.特别请参见包含k-means函数的子软件包 biganalytics .

The bigmemory package (or now family of packages -- see their website) used k-means as running example of extended analytics on large data. See in particular the sub-package biganalytics which contains the k-means function.

这篇关于R在非常大的稀疏矩阵上的k均值聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:29