如何基于python中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集群? | hon中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集

本文介绍了如何基于python中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集群?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用下面的代码使用Scikit Learn创建k均值聚类.

I used the below code to create k-means clusters using Scikit learn.

kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=1000,algorithm='full',init='k-means++')

kmean_fit = kmean.fit(clus_data)

我还使用kmean_fit.cluster_centers_

然后我腌制了K均值对象.

I then pickled the K means object.

filename = pickle_path+'\\'+'_kmean_fit.sav'
pickle.dump(kmean_fit, open(filename, 'wb'))

因此，我可以使用kmean_fit.predict().

问题:

将加载kmeans泡菜对象并应用的方法kmean_fit.predict()允许我将 新观测值分配给现有集群的 ?这种方法只是从头开始重新构建新数据吗?

Will the approach of loading kmeans pickle object and applyingkmean_fit.predict() allow me to assign the new observation toexisting clusters based on centroid of the existing clusters? Does this approach just recluster from scratch on the new data?

如果该方法不起作用，如何将新观测值分配给鉴于我已经保存了集群，所以现有集群使用有效的python代码编写中间人?

If this method wont work how to assign the new observation to existing clusters given that I already have saved the cluster centriods using efficent python code?

PS:我知道使用现有集群作为因变量来构建分类器是另一种方法，但是由于时间紧迫，我不想这样做.

PS: I know building a classifer using existing clusters as dependent variable is another way but I dont want to do that because of time crunch.

推荐答案

是.不管是否腌制sklearn.cluster.KMeans对象(如果正确地对其进行腌制，您将要处理相同"原始对象)都不会影响您可以使用predict聚类新观测值的方法.

Yes. Whether the sklearn.cluster.KMeans object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict method to cluster a new observation.

一个例子:

from sklearn.cluster import KMeans
from sklearn.externals import joblib

model = KMeans(n_clusters = 2, random_state = 100)
X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
model.fit(X)

出局:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)

继续:

joblib.dump(model, 'model.pkl')
model_loaded = joblib.load('model.pkl')

model_loaded

出局:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)

看看n_clusters和random_state对象之间的n_clusters和random_state参数如何相同?你很好.

See how the n_clusters and random_state parameters are the same between the model and model_new objects? You're good to go.

使用新"模型进行预测:

Predict with the "new" model:

model_loaded.predict([0,0,0,0])

Out[64]: array([0])

这篇关于如何基于python中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集群?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！