问题描述
我使用下面的代码使用Scikit Learn创建k均值聚类.
I used the below code to create k-means clusters using Scikit learn.
kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=1000,algorithm='full',init='k-means++')
kmean_fit = kmean.fit(clus_data)
我还使用kmean_fit.cluster_centers_
然后我腌制了K均值对象.
I then pickled the K means object.
filename = pickle_path+'\\'+'_kmean_fit.sav'
pickle.dump(kmean_fit, open(filename, 'wb'))
因此,我可以使用kmean_fit.predict().
问题:
-
将加载kmeans泡菜对象并应用的方法
kmean_fit.predict()
允许我将 新观测值分配给现有集群的 ?这种方法只是从头开始重新构建新数据吗?
Will the approach of loading kmeans pickle object and applying
kmean_fit.predict()
allow me to assign the new observation toexisting clusters based on centroid of the existing clusters? Does this approach just recluster from scratch on the new data?
如果该方法不起作用,如何将新观测值分配给 鉴于我已经保存了集群,所以现有集群 使用有效的python代码编写中间人?
If this method wont work how to assign the new observation to existing clusters given that I already have saved the cluster centriods using efficent python code?
PS:我知道使用现有集群作为因变量来构建分类器是另一种方法,但是由于时间紧迫,我不想这样做.
PS: I know building a classifer using existing clusters as dependent variable is another way but I dont want to do that because of time crunch.
推荐答案
是.不管是否腌制sklearn.cluster.KMeans
对象(如果正确地对其进行腌制,您将要处理相同"原始对象)都不会影响您可以使用predict
聚类新观测值的方法.
Yes. Whether the sklearn.cluster.KMeans
object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict
method to cluster a new observation.
一个例子:
from sklearn.cluster import KMeans
from sklearn.externals import joblib
model = KMeans(n_clusters = 2, random_state = 100)
X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
model.fit(X)
出局:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
verbose=0)
继续:
joblib.dump(model, 'model.pkl')
model_loaded = joblib.load('model.pkl')
model_loaded
出局:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
verbose=0)
看看n_clusters
和random_state
对象之间的n_clusters
和random_state
参数如何相同?你很好.
See how the n_clusters
and random_state
parameters are the same between the model
and model_new
objects? You're good to go.
使用新"模型进行预测:
Predict with the "new" model:
model_loaded.predict([0,0,0,0])
Out[64]: array([0])
这篇关于如何基于python中最近的集群中心逻辑将新的观测值分配给现有的Kmeans集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!