Python SKLearn:如何在OneHotEncoder之后获取功能名称?

本文介绍了Python SKLearn:如何在OneHotEncoder之后获取功能名称?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

通过SKLearn OneHotEncoder转换数据集后，我想获取数据集的功能名称.

I would like to get the feature names of a data set after it has been transformed by SKLearn OneHotEncoder.

在 OneHotEncoder中的active_features_属性中，可以很好地解释n_values_，feature_indices_和在执行transform()之后填充.

In active_features_ attribute in OneHotEncoder one can see a very good explanation how the attributes n_values_, feature_indices_ and active_features_ get filled after transform() was executed.

我的问题是:

例如基于DataFrame的输入数据:

For e.g. DataFrame based input data:

data = pd.DataFrame({"a": [0, 1, 2,0], "b": [0,1,4, 5], "c":[0,1,4, 5]}).as_matrix()

从原始要素名称a，b和c到转换后的要素名称列表的代码看起来如何(例如:

How does the code look like to get from the original feature names a, b and c to a list of the transformed feature names (like e.g:

a-0，a-1，a-2，b-0，b-1，b-2，b-3，c-0，c-1，c-2，c-3

a-0,a-1, a-2, b-0, b-1, b-2, b-3, c-0, c-1, c-2, c-3

或

a-0，a-1，a-2，b-0，b-1，b-2，b-3，b-4，b-5，b-6，b-7，b-8

a-0,a-1, a-2, b-0, b-1, b-2, b-3, b-4, b-5, b-6, b-7, b-8

或有助于查看已编码的列对原始列的分配的任何内容.

or anything that helps to see the assignment of encoded columns to the original columns).

背景:我想了解一些算法的功能重要性，以了解哪个功能对所使用的算法影响最大.

Background: I would like to see the feature importances of some of the algorithms to get a feeling for which feature have the most effect on the algorithm used.

推荐答案

您可以使用pd.get_dummies():

pd.get_dummies(data["a"],prefix="a")

会给您:

    a_0 a_1 a_2
0   1   0   0
1   0   1   0
2   0   0   1
3   1   0   0

可以自动生成列名.您可以将其应用于所有列，然后获取列名称.无需将它们转换为numpy矩阵.

which can automatically generates the column names. You can apply this to all your columns and then get the columns names. No need to convert them to a numpy matrix.

所以:

df = pd.DataFrame({"a": [0, 1, 2,0], "b": [0,1,4, 5], "c":[0,1,4, 5]})
data = df.as_matrix()

解决方案如下:

columns = df.columns
my_result = pd.DataFrame()
temp = pd.DataFrame()
for runner in columns:
    temp = pd.get_dummies(df[runner], prefix=runner)
    my_result[temp.columns] = temp
print(my_result.columns)

>>Index(['a_0', 'a_1', 'a_2', 'b_0', 'b_1', 'b_4', 'b_5', 'c_0', 'c_1', 'c_4',
       'c_5'],
      dtype='object')

这篇关于Python SKLearn:如何在OneHotEncoder之后获取功能名称?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！