sklearn.ensemble.RandomForestClassifier 中的邻近矩阵

本文介绍了sklearn.ensemble.RandomForestClassifier 中的邻近矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用随机森林在 Python 中执行聚类.在随机森林的 R 实现中，您可以设置一个标志来获取邻近矩阵.我似乎在随机森林的 python scikit 版本中找不到任何类似的东西.有谁知道python版本有没有等价的计算?

I'm trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can't seem to find anything similar in the python scikit version of Random Forest. Does anyone know if there is an equivalent calculation for the python version?

推荐答案

我们还没有在 Scikit-Learn 中实现邻近矩阵.

We don't implement proximity matrix in Scikit-Learn (yet).

然而，这可以通过依赖我们的决策树实现中提供的 apply 函数来完成.也就是说，对于数据集中的所有样本对，遍历森林中的决策树(通过forest.estimators_)并计算它们落在同一叶子上的次数，即数量多次 apply 为配对中的两个样本提供相同的节点 ID.

However, this could be done by relying on the apply function provided in our implementation of decision trees. That is, for all pairs of samples in your dataset, iterate over the decision trees in the forest (through forest.estimators_) and count the number of times they fall in the same leaf, i.e., the number of times apply give the same node id for both samples in the pair.

希望这会有所帮助.

这篇关于sklearn.ensemble.RandomForestClassifier 中的邻近矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！