本文介绍了如何使用python打印随机森林回归中重要特征的顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我的一个数据集上创建一个随机森林回归模型.我还需要找到每个变量的重要性顺序以及它们的名称.我已经尝试了几件事,但无法达到我想要的目标.以下是我在Boston Housing数据集中尝试的示例代码:

I am trying out to create a Random Forest regression model on one of my datasets. I need to find the order of importance of each variable along with their names as well. I have tried few things but can't achieve what I want. Below is the sample code I tried on Boston Housing dataset:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np
boston = load_boston()
rf=RandomForestRegressor(max_depth=50)
idx=range(len(boston.target))
np.random.shuffle(idx)
rf.fit(boston.data[:500], boston.target[:500])
instance=boston.data[[0,5, 10]]
print rf.predict(instance[0])
print rf.predict(instance[1])
print rf.predict(instance[2])
important_features=[]
for x,i in enumerate(rf.feature_importances_):
      important_features.append(str(x))
print 'Most important features:',', '.join(important_features)

最重要的特征:0、1、2、3、4、5、6、7、8、9、10、11、12

Most important features: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

如果我打印此:

impor = rf.feature_importances_
impor

我得到以下输出:

array([  3.45665230e-02,   4.58687594e-04,   5.45376404e-03,
     3.33388828e-04,   2.90936201e-02,   4.15908448e-01,
     1.04131089e-02,   7.26451301e-02,   3.51628079e-03,
     1.20860975e-02,   1.40417760e-02,   8.97546838e-03,
     3.92507707e-01])

我需要获取与这些值关联的名称,然后从这些功能中选择前n个.

I need to get the names associated with these values and then pick the top n out of these features.

推荐答案

首先,您为变量使用了错误的名称.您正在使用 important_features .改用 feature_importances _ .其次,它将返回一个形状为 [n_features,] 的数组,其中包含feature_importance的值.您需要按照这些值的顺序对它们进行排序,以获得最重要的功能.请参见 RandomForestRegressor文档

First, you are using wrong name for the variable. You are using important_features. Use feature_importances_ instead. Second, it will return an array of shape [n_features,] which contains the values of the feature_importance. You need to sort them in order of those values to get the most important features.See the RandomForestRegressor documentation

添加了代码

important_features_dict = {}
for idx, val in enumerate(rf.feature_importances_):
    important_features_dict[idx] = val

important_features_list = sorted(important_features_dict,
                                 key=important_features_dict.get,
                                 reverse=True)

print('5 most important features: {important_features_list[:5]}')

这将按降序打印重要特征的索引.(首先是最重要的,依此类推)

This will print the index of important features in decreasing order. (First is most important, and so on)

这篇关于如何使用python打印随机森林回归中重要特征的顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:50