本文介绍了R中的随机森林是否有训练数据大小的限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对我的训练数据进行随机森林训练,该数据具有114954行和135列(预测变量).而且我收到以下错误.

I am training randomforest on my training data which has 114954 rows and 135 columns (predictors). And I am getting the following error.

model <- randomForest(u_b_stars~. ,data=traindata,importance=TRUE,do.trace=100, keep.forest=TRUE, mtry=30)

Error: cannot allocate vector of size 877.0 Mb
In addition: Warning messages:
1: In randomForest.default(m, y, ...) :
The response has five or fewer unique values.  Are you sure you want to do regression?
2: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
3: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
4: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)
5: In matrix(double(nrnodes * nt), ncol = nt) :
Reached total allocation of 3958Mb: see help(memory.size)

我想知道如何避免此错误?我应该在更少的数据上训练它吗?但这当然不会好.有人可以建议我不必从训练数据中提取较少数据的替代方法.我想使用完整的培训数据.

I want to know know what do I do to avoid this error? Should I train it on less data? But that wont be good, of course. Can somebody suggest an alternative in which I don't have to take less data from training data. I want to use complete training data.

推荐答案

如前一个问题(我现在找不到)中所述,增加样本大小会影响非线性中RF的内存需求道路.不仅模型矩阵更大,而且基于每片叶子的点数,每棵树的默认大小也更大.

As was stated in an answer to a previous question (which I can't find now), increasing the sample size affects the memory requirements of RF in a nonlinear way. Not only is the model matrix larger, but the default size of each tree, based on the number of points per leaf, is also larger.

要在给定内存限制的情况下拟合模型,可以执行以下操作:

To fit the model given your memory constraints, you can do the following:

  1. nodesize参数增加到比默认值大的值,对于回归RF,该值为5.使用114k观测值,您应该能够在不影响性能的情况下显着增加观测值.

  1. Increase the nodesize parameter to something bigger than the default, which is 5 for a regression RF. With 114k observations, you should be able to increase this significantly without hurting performance.

使用ntree参数减少每个RF的树数.拟合几个小的RF,然后将它们与combine结合以产生整个森林.

Reduce the number of trees per RF, with the ntree parameter. Fit several small RFs, then combine them with combine to produce the entire forest.

这篇关于R中的随机森林是否有训练数据大小的限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:51