问题描述
我正在对我的训练数据进行随机森林训练,该数据具有114954行和135列(预测变量).而且我收到以下错误.
I am training randomforest on my training data which has 114954 rows and 135 columns (predictors). And I am getting the following error.
model <- randomForest(u_b_stars~. ,data=traindata,importance=TRUE,do.trace=100, keep.forest=TRUE, mtry=30)
Error: cannot allocate vector of size 877.0 Mb
In addition: Warning messages:
1: In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
2: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
3: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
4: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
5: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
我想知道如何避免此错误?我应该在更少的数据上训练它吗?但这当然不会好.有人可以建议我不必从训练数据中提取较少数据的替代方法.我想使用完整的培训数据.
I want to know know what do I do to avoid this error? Should I train it on less data? But that wont be good, of course. Can somebody suggest an alternative in which I don't have to take less data from training data. I want to use complete training data.
推荐答案
如前一个问题(我现在找不到)中所述,增加样本大小会影响非线性中RF的内存需求道路.不仅模型矩阵更大,而且基于每片叶子的点数,每棵树的默认大小也更大.
As was stated in an answer to a previous question (which I can't find now), increasing the sample size affects the memory requirements of RF in a nonlinear way. Not only is the model matrix larger, but the default size of each tree, based on the number of points per leaf, is also larger.
要在给定内存限制的情况下拟合模型,可以执行以下操作:
To fit the model given your memory constraints, you can do the following:
-
将
nodesize
参数增加到比默认值大的值,对于回归RF,该值为5.使用114k观测值,您应该能够在不影响性能的情况下显着增加观测值.
Increase the
nodesize
parameter to something bigger than the default, which is 5 for a regression RF. With 114k observations, you should be able to increase this significantly without hurting performance.
使用ntree
参数减少每个RF的树数.拟合几个小的RF,然后将它们与combine
结合以产生整个森林.
Reduce the number of trees per RF, with the ntree
parameter. Fit several small RFs, then combine them with combine
to produce the entire forest.
这篇关于R中的随机森林是否有训练数据大小的限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!