本文介绍了如何在 Spark (scala) 中查看随机森林统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Spark 中有一个 RandomForestClassifierModel.使用 .toDebugString() 输出以下内容

I have a RandomForestClassifierModel in Spark. Using .toDebugString() outputs the following

Tree 0 (weight 1.0):
  If (feature 0 in {1.0,2.0,3.0})
   If (feature 3 in {2.0,3.0})
    If (feature 8 <= 55.3)
.
.
  Else (feature 0 not in {1.0,2.0,3.0})
.
.
Tree 1 (weight 1.0):
.
.
...etc

我想查看模型中的实际数据,例如

I'd like to view the actual data as it goes through the model, something like

Tree 0 (weight 1.0):
  If (feature 0 in {1.0,2.0,3.0}) 60%
   If (feature 3 in {2.0,3.0}) 57%
    If (feature 8 <= 55.3) 22%
.
.
  Else (feature 0 not in {1.0,2.0,3.0}) 40%
.
.
Tree 1 (weight 1.0):
.
...etc

通过查看每个节点中标签的概率,我可以看到数据(数千条记录)在树中最有可能遵循哪些路径,这将是非常好的洞察力!

By seeing the probability of labels in each node, I can see which paths are most likely to be followed in the trees by the data (thousands of records), which would be really good insight!

我在这里找到了一个很棒的答案:Spark MLib 决策树:按特征标记的概率?

I found an awesome answer here: Spark MLib Decision Trees: Probability of labels by features?

不幸的是,答案中的方法使用了 MLlib API,经过多次尝试,我未能使用 DataFrame API 复制它,该 API 具有类 Node 和 Split 的不同实现:(

Unfortunately the method in the answer uses the MLlib API, and after lots of trying, I have failed to replicate it using the DataFrame API, which has different implementations of the classes Node and Split :(

推荐答案

昨天我发现有用的一种方法是我可以使用 spark.read.parquet() 函数从模型/数据文件中读取输出.这样,有关某个节点的所有信息都可以作为整个数据帧进行检索.

One way I found useful yesterday was I could use spark.read.parquet() function to read the output from the model/data file. This way all information about a certain node could be retrieved as a whole dataframe.

`val modelPath = "some/path/to/your/model"
val dataPath = modelPath + "/data"
val nodeData: DataFrame = spark.read.parquet(dataPath)
nodeData.show(500,false)
nodeData.printSchema()`

然后你可以用信息重建树.希望有帮助.

Then you can rebuild the tree with information. Hope it helps.

这篇关于如何在 Spark (scala) 中查看随机森林统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:26