本文介绍了Pipe-lining Standardscaler、递归特征选择和分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个给定的数据集 X 和 Y.我想使用管道实现以下步骤:
I have a given dataset, X and Y.I want to implement the following steps using pipeline:
- Standardscaler
- Recursive feature selection
- RandomForestClassifier
- cross-validation predict
我实现如下:
import numpy as np
from sklearn.feature_selection import RFE, RFECV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
data = load_iris()
X = data.data
Y = data.target
print X.shape
print Y.shape
clf = RandomForestClassifier(n_estimators=50,max_features=None,n_jobs=-1,random_state=0)
kf = KFold(n_splits=2, shuffle=True, random_state=0)
pipeline = Pipeline([('standardscaler', StandardScaler()),
('rfecv', RFECV(estimator=clf, step=1, cv=kf, scoring='accuracy', n_jobs=7)),
('clf', clf)])
pipeline.fit(X,Y)
ypredict = cross_val_predict(pipeline, X, Y, cv=kf)
accuracy = accuracy_score(Y, ypredict)
print (accuracy)
请深入研究我的实现,让我知道我的代码哪里有问题.谢谢.
Please look into my implementation deeply, and let me know where is wrong with my code. Thank you.
推荐答案
这有效.pipeline
中的最终估计器只需要实现 fit
,而 REFCV
所做的.代码如下:
This works. The final estimator in the pipeline
only needs to implement fit
which REFCV
does. Here's the code:
from sklearn.feature_selection import RFECV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
data = load_iris()
X = data.data
Y = data.target
clf = RandomForestClassifier()
# create pipeline
estimators = [('standardize' , StandardScaler()),
('rfecv', RFECV(estimator=clf, scoring='accuracy'))]
# build the pipeline
pipeline = Pipeline(estimators)
# run the pipeline
kf = KFold(n_splits=2, shuffle=True, random_state=0)
ypredict = cross_val_predict(pipeline, X, Y, cv=kf)
accuracy = accuracy_score(Y, ypredict)
print (accuracy)
'Output':
0.96
这篇关于Pipe-lining Standardscaler、递归特征选择和分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!