本文介绍了带有 GridSearchCV 的随机森林 - param_grid 上的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 GridSearchCV 创建一个随机森林模型,但遇到与 param_grid 相关的错误:"ValueError: Invalid parameter max_features for estimator Pipeline.使用 `estimator.get_params().keys 检查可用参数列表()".我正在对文档进行分类,因此我也在将 tf-idf 向量化器推送到管道中.代码如下:

Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()". I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline.Here is the code:

from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, f1_score, accuracy_score, precision_score, confusion_matrix
from sklearn.pipeline import Pipeline

 #Classifier Pipeline
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('classifier', RandomForestClassifier())
])
# Params for classifier
params = {"max_depth": [3, None],
              "max_features": [1, 3, 10],
              "min_samples_split": [1, 3, 10],
              "min_samples_leaf": [1, 3, 10],
              # "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}

# Grid Search Execute
rf_grid = GridSearchCV(estimator=pipeline , param_grid=params) #cv=10
rf_detector = rf_grid.fit(X_train, Y_train)
print(rf_grid.grid_scores_)

我不明白为什么显示错误.顺便说一句,当我使用 GridSearchCV 运行决策树时,也会发生同样的情况.(Scikit-learn 0.17)

I can't figure out why the error is showing. The same btw is occurring when I run a decision tree with GridSearchCV. (Scikit-learn 0.17)

推荐答案

您必须将参数分配给管道中的命名步骤.在您的情况下 classifier.尝试在参数名称前加上 classifier__.示例管道

You have to assign the parameters to the named step in the pipeline. In your case classifier. Try prepending classifier__ to the parameter name. Sample pipeline

params = {"classifier__max_depth": [3, None],
              "classifier__max_features": [1, 3, 10],
              "classifier__min_samples_split": [1, 3, 10],
              "classifier__min_samples_leaf": [1, 3, 10],
              # "bootstrap": [True, False],
              "classifier__criterion": ["gini", "entropy"]}

这篇关于带有 GridSearchCV 的随机森林 - param_grid 上的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-23 03:34