问题描述
我有一个数据集,该数据集以前被分为3组:训练,验证和测试。这些集合必须按照给定的方式使用,以便比较不同算法的性能。
I have a dataset, which has previously been split into 3 sets: train, validation and test. These sets have to be used as given in order to compare the performance across different algorithms.
我现在想使用验证集优化我的SVM的参数。但是,我找不到如何将验证集明确输入到 sklearn.grid_search.GridSearchCV()
中。以下是我先前用于在训练集上进行K折交叉验证的一些代码。但是,对于此问题,我需要使用给定的验证集。我该怎么做?
I would now like to optimize the parameters of my SVM using the validation set. However, I cannot find how to input the validation set explicitly into sklearn.grid_search.GridSearchCV()
. Below is some code I've previously used for doing K-fold cross-validation on the training set. However, for this problem I need to use the validation set as given. How can I do that?
from sklearn import svm, cross_validation
from sklearn.grid_search import GridSearchCV
# (some code left out to simplify things)
skf = cross_validation.StratifiedKFold(y_train, n_folds=5, shuffle = True)
clf = GridSearchCV(svm.SVC(tol=0.005, cache_size=6000,
class_weight=penalty_weights),
param_grid=tuned_parameters,
n_jobs=2,
pre_dispatch="n_jobs",
cv=skf,
scoring=scorer)
clf.fit(X_train, y_train)
推荐答案
使用
ps = PredefinedSplit(test_fold=your_test_fold)
然后在<$ c $中设置 cv = ps
c> GridSearchCV
then set cv=ps
in GridSearchCV
test_fold [i]给出样本i的测试集折叠。值为-1表示相应的样本不是任何测试集折叠的一部分,而是将始终放入训练折叠中。
test_fold[i] gives the test set fold of sample i. A value of -1 indicates that the corresponding sample is not part of any test set folds, but will instead always be put into the training fold.
也请参见
这篇关于使用sklearn对网格搜索使用显式(预定义)验证集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!