本文介绍了Pipe-lining Standardscaler、递归特征选择和分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个给定的数据集 X 和 Y.我想使用管道实现以下步骤:

I have a given dataset, X and Y.I want to implement the following steps using pipeline:

- Standardscaler
- Recursive feature selection
- RandomForestClassifier
- cross-validation predict

我实现如下:

import numpy as np
from sklearn.feature_selection import RFE, RFECV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris

data = load_iris()

X = data.data
Y = data.target

print X.shape
print Y.shape

clf = RandomForestClassifier(n_estimators=50,max_features=None,n_jobs=-1,random_state=0)
kf = KFold(n_splits=2, shuffle=True, random_state=0)
pipeline = Pipeline([('standardscaler', StandardScaler()),
                     ('rfecv', RFECV(estimator=clf, step=1, cv=kf, scoring='accuracy', n_jobs=7)),
                      ('clf', clf)])

pipeline.fit(X,Y)

ypredict = cross_val_predict(pipeline, X, Y, cv=kf)
accuracy = accuracy_score(Y, ypredict)

print (accuracy)

请深入研究我的实现,让我知道我的代码哪里有问题.谢谢.

Please look into my implementation deeply, and let me know where is wrong with my code. Thank you.

推荐答案

这有效.pipeline 中的最终估计器只需要实现 fit ,而 REFCV 所做的.代码如下:

This works. The final estimator in the pipeline only needs to implement fit which REFCV does. Here's the code:

from sklearn.feature_selection import RFECV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris

data = load_iris()

X = data.data
Y = data.target

clf = RandomForestClassifier()

# create pipeline
estimators = [('standardize' , StandardScaler()),
             ('rfecv', RFECV(estimator=clf, scoring='accuracy'))]

# build the pipeline
pipeline = Pipeline(estimators)

# run the pipeline
kf = KFold(n_splits=2, shuffle=True, random_state=0)
ypredict = cross_val_predict(pipeline, X, Y, cv=kf)
accuracy = accuracy_score(Y, ypredict)

print (accuracy)

'Output':
0.96

这篇关于Pipe-lining Standardscaler、递归特征选择和分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-01 08:48