hon中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集

hon中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集

本文介绍了如何基于python中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集群?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用下面的代码使用Scikit Learn创建k均值聚类.

I used the below code to create k-means clusters using Scikit learn.

kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=1000,algorithm='full',init='k-means++')

kmean_fit = kmean.fit(clus_data)

我还使用kmean_fit.cluster_centers_

然后我腌制了K均值对象.

I then pickled the K means object.

filename = pickle_path+'\\'+'_kmean_fit.sav'
pickle.dump(kmean_fit, open(filename, 'wb'))

因此,我可以使用kmean_fit.predict().

问题:

  1. 将加载kmeans泡菜对象并应用的方法kmean_fit.predict()允许我将 新观测值分配给现有集群的 ?这种方法只是从头开始重新构建新数据吗?

  1. Will the approach of loading kmeans pickle object and applyingkmean_fit.predict() allow me to assign the new observation toexisting clusters based on centroid of the existing clusters? Does this approach just recluster from scratch on the new data?

如果该方法不起作用,如何将新观测值分配给 鉴于我已经保存了集群,所以现有集群 使用有效的python代码编写中间人?

If this method wont work how to assign the new observation to existing clusters given that I already have saved the cluster centriods using efficent python code?

PS:我知道使用现有集群作为因变量来构建分类器是另一种方法,但是由于时间紧迫,我不想这样做.

PS: I know building a classifer using existing clusters as dependent variable is another way but I dont want to do that because of time crunch.

推荐答案

是.不管是否腌制sklearn.cluster.KMeans对象(如果正确地对其进行腌制,您将要处理相同"原始对象)都不会影响您可以使用predict聚类新观测值的方法.

Yes. Whether the sklearn.cluster.KMeans object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict method to cluster a new observation.

一个例子:

from sklearn.cluster import KMeans
from sklearn.externals import joblib

model = KMeans(n_clusters = 2, random_state = 100)
X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
model.fit(X)

出局:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)

继续:

joblib.dump(model, 'model.pkl')
model_loaded = joblib.load('model.pkl')

model_loaded

出局:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)

看看n_clustersrandom_state对象之间的n_clustersrandom_state参数如何相同?你很好.

See how the n_clusters and random_state parameters are the same between the model and model_new objects? You're good to go.

使用新"模型进行预测:

Predict with the "new" model:

model_loaded.predict([0,0,0,0])

Out[64]: array([0])

这篇关于如何基于python中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 14:45