随机森林包预测，新数据参数?

本文介绍了随机森林包预测，新数据参数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近刚开始在 R 中使用随机森林包.在我的森林成长之后，我尝试使用相同的数据集(即训练数据集)预测响应，这给了我一个与之前不同的混淆矩阵与森林对象本身一起打印.我认为 newdata 参数可能有问题，但我按照文档中给出的示例 t 并给出了同样的问题.这是使用 Species 数据集的示例.这是作者在他们的文档中使用的相同示例，除了我使用相同的数据集来训练和预测......所以这里的问题是:为什么这两个混淆矩阵不相同?

I've just recently started playing around with the random forest package in R. After growing my forest, I tried predicting the response using the same dataset (ie the training dataset) which gave me a confusion matrix different from the one that was printed with the forest object itself. I thought there might be something wrong with the newdata argument but I followed the example given in the documentation to the t and it gave the same problem. Here's an example using the Species dataset. this is the same example the authors used in their documentation, except I use the same dataset to train and predict...So the question here is: why are those two confusion matrices not identical?

data(iris)
set.seed(111)
ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.8, 0.2))
#grow forest
iris.rf <- randomForest(Species ~ ., data=iris[ind == 1,])
print(iris.rf)

Call:
 randomForest(formula = Species ~ ., data = iris[ind == 1, ])
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 3.33%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         45          0         0  0.00000000
versicolor      0         39         1  0.02500000
virginica       0          3        32  0.08571429

#predict using the training again...
iris.pred <- predict(iris.rf, iris[ind == 1,])
table(observed = iris[ind==1, "Species"], predicted = iris.pred)

           predicted
observed     setosa versicolor virginica
  setosa         45          0         0
  versicolor      0         40         0
  virginica       0          0        35

随机森林包预测

随机森林包预测，新数据参数?

问题描述

推荐答案