本文介绍了如何从SciPy的层次化聚集聚类中获取质心?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SciPy的层次化聚集聚类方法对m x n个特征矩阵进行聚类,但是在聚类完成之后,我似乎无法弄清楚如何从生成的聚类中获取质心.下面是我的代码:

I am using SciPy's hierarchical agglomerative clustering methods to cluster a m x n matrix of features, but after the clustering is complete, I can't seem to figure out how to get the centroid from the resulting clusters. Below follows my code:

Y = distance.pdist(features)
Z = hierarchy.linkage(Y, method = "average", metric = "euclidean")
T = hierarchy.fcluster(Z, 100, criterion = "maxclust")

我正在获取特征矩阵,计算它们之间的欧式距离,然后将其传递给分层聚类方法.从那里开始,我将创建平面群集,最多包含100个群集

I am taking my matrix of features, computing the euclidean distance between them, and then passing them onto the hierarchical clustering method. From there, I am creating flat clusters, with a maximum of 100 clusters

现在,基于平坦簇T,我如何获得代表每个平坦簇的1 x n重心?

Now, based on the flat clusters T, how do I get the 1 x n centroid that represents each flat cluster?

推荐答案

一个可能的解决方案是一个函数,该函数返回带有质心的代码本,如scipy.cluster.vq中的kmeans一样.您只需要将分区划分为具有平坦簇part和原始观测值X

A possible solution is a function, which returns a codebook with the centroids like kmeans in scipy.cluster.vq does. Only thing you need is the partition as vector with flat clusters part and the original observations X

def to_codebook(X, part):
    """
    Calculates centroids according to flat cluster assignment

    Parameters
    ----------
    X : array, (n, d)
        The n original observations with d features

    part : array, (n)
        Partition vector. p[n]=c is the cluster assigned to observation n

    Returns
    -------
    codebook : array, (k, d)
        Returns a k x d codebook with k centroids
    """
    codebook = []

    for i in range(part.min(), part.max()+1):
        codebook.append(X[part == i].mean(0))

    return np.vstack(codebook)

这篇关于如何从SciPy的层次化聚集聚类中获取质心?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 15:47