本文介绍了如何获得随机森林模型 R 中每棵树使用的 OOB 样本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以获得随机森林算法为每棵树使用的OOB样本?我正在使用 R 语言.我知道 RandomForest 算法使用几乎 66% 的数据(随机选择)来成长每棵树,并将 34% 的数据作为 OOB 样本来测量 OOB 错误,但我不知道如何获取这些 OOB 样本每棵树?

Is it possible to get the OOB samples used by random forest algorithm for each tree ?I'm using R language.I know that RandomForest algorithm uses almost 66% of the data (selected randomly) to grow up each tree, and 34 % of the data as OOB samples to measure the OOB error, but I don't know how to get those OOB samples for each tree ?

有什么想法吗?

推荐答案

假设您使用的是 randomForest 包,您只需将 keep.inbag 参数设置为.

Assuming you are using the randomForest package, you just need to set the keep.inbag argument to TRUE.

library(randomForest)
set.seed(1)
rf <- randomForest(Species ~ ., iris, keep.inbag = TRUE)

输出列表将包含一个 n×ntree 矩阵,可以通过名称 inbag 访问.

The output list will contain an n by ntree matrix that can be accessed by the name inbag.

dim(rf$inbag)
# [1] 150 500

rf$inbag[1:5, 1:3]
#   [,1] [,2] [,3]
# 1    0    1    0
# 2    1    1    0
# 3    1    0    1
# 4    1    0    1
# 5    0    0    2

矩阵中的值告诉您样品在袋中的次数.例如,上面第 5 行第 3 列中的值 2 表示第 5 个观察值被包含在第 3 棵树的袋中两次.

The values in the matrix tell you how many times a sample was in-bag. For example, the value of 2 in row 5 column 3 above says that the 5th observation was included in-bag twice for the 3rd tree.

这里有一点背景知识,一个样本可以多次出现在袋子里(因此是 2),因为默认情况下采样是通过替换完成的.

As a bit of background here, a sample can show up in-bag more than once (hence the 2) because by default the sampling is done with replacement.

您还可以通过 replace 参数进行无替换采样.

You can also sample without replacement via the replace parameter.

set.seed(1)
rf2 <- randomForest(Species ~ ., iris, keep.inbag = TRUE, replace = FALSE)

现在我们可以验证在不替换的情况下,任何样本被包含的最大次数是一次.

And now we can verify that without replacement, the maximum number of times any sample is included is once.

# with replacement, the maximum number of times a sample is included in a tree is 7
max(rf$inbag)
# [1] 7

# without replacemnet, the maximum number of times a sample is included in a tree is 1
max(rf2$inbag)
# [1] 1

这篇关于如何获得随机森林模型 R 中每棵树使用的 OOB 样本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:24