本文介绍了如何使用warm_start的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 warm_start 参数将训练数据添加到我的随机森林分类器.我希望它像这样使用:

I'd like to use the warm_start parameter to add training data to my random forest classifier. I expected it to be used like this:

clf = RandomForestClassifier(...)
clf.fit(get_data())
clf.fit(get_more_data(), warm_start=True)

但是 warm_start 参数是一个构造函数参数.那我要不要做这样的事情?

But the warm_start parameter is a constructor parameter. So do I do something like this?

clf = RandomForestClassifier()
clf.fit(get_data())
clf = RandomForestClassifier (warm_start=True)
clf.fit(get_more_data)

这对我来说毫无意义.对构造函数的新调用不会丢弃以前的训练数据吗?我想我错过了一些东西.

That makes no sense to me. Won't the new call to the constructor discard previous training data? I think I'm missing something.

推荐答案

(取自 Miriam 的回答)的基本模式:

The basic pattern of (taken from Miriam's answer):

clf = RandomForestClassifier(warm_start=True)
clf.fit(get_data())
clf.fit(get_more_data())

将是正确的 API 用法.

would be the correct usage API-wise.

但是这里有一个问题.

正如文档所说:

当设置为 True 时,重用之前调用 fit 的解决方案并向集成添加更多估计器,否则,只适合一个全新的森林.

这意味着,warm_start 唯一能为您做的就是添加新的 DecisionTree.之前的所有树木似乎都没有受到影响!

it means, that the only thing warm_start can do for you, is adding new DecisionTree's. All the previous trees seem to be untouched!

让我们通过一些来源检查一下:

  n_more_estimators = self.n_estimators - len(self.estimators_)

    if n_more_estimators < 0:
        raise ValueError('n_estimators=%d must be larger or equal to '
                         'len(estimators_)=%d when warm_start==True'
                         % (self.n_estimators, len(self.estimators_)))

    elif n_more_estimators == 0:
        warn("Warm-start fitting without increasing n_estimators does not "
             "fit new trees.")

这基本上告诉我们,在接近新拟合之前,您需要增加估算器的数量!

This basically tells us, that you would need to increase the number of estimators before approaching a new fit!

我不知道 sklearn 在这里期望什么样的用法.我不确定,如果拟合,增加内部变量并再次拟合是否正确,但我以某种方式怀疑它(特别是因为 n_estimators 不是公共类变量).

I have no idea what kind of usage sklearn expects here. I'm not sure, if fitting, increasing internal variables and fitting again is correct usage, but i somehow doubt it (especially as n_estimators is not a public class-variable).

你的基本方法(关于这个库和这个分类器)对于你的核外学习来说可能不是一个好主意!我不会进一步追求这个.

Your basic approach (in regards to this library and this classifier) is probably not a good idea for your out-of-core learning here! I would not pursue this further.

这篇关于如何使用warm_start的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 18:54