问题描述
通过SKLearn OneHotEncoder转换数据集后,我想获取数据集的功能名称.
I would like to get the feature names of a data set after it has been transformed by SKLearn OneHotEncoder.
在 OneHotEncoder中的active_features_属性中,可以很好地解释n_values_
,feature_indices_
和在执行transform()
之后填充.
In active_features_ attribute in OneHotEncoder one can see a very good explanation how the attributes n_values_
, feature_indices_
and active_features_
get filled after transform()
was executed.
我的问题是:
例如基于DataFrame的输入数据:
For e.g. DataFrame based input data:
data = pd.DataFrame({"a": [0, 1, 2,0], "b": [0,1,4, 5], "c":[0,1,4, 5]}).as_matrix()
从原始要素名称a
,b
和c
到转换后的要素名称列表的代码看起来如何(例如:
How does the code look like to get from the original feature names a
, b
and c
to a list of the transformed feature names (like e.g:
a-0
,a-1
,a-2
,b-0
,b-1
,b-2
,b-3
,c-0
,c-1
,c-2
,c-3
a-0
,a-1
, a-2
, b-0
, b-1
, b-2
, b-3
, c-0
, c-1
, c-2
, c-3
或
a-0
,a-1
,a-2
,b-0
,b-1
,b-2
,b-3
,b-4
,b-5
,b-6
,b-7
,b-8
a-0
,a-1
, a-2
, b-0
, b-1
, b-2
, b-3
, b-4
, b-5
, b-6
, b-7
, b-8
或有助于查看已编码的列对原始列的分配的任何内容.
or anything that helps to see the assignment of encoded columns to the original columns).
背景:我想了解一些算法的功能重要性,以了解哪个功能对所使用的算法影响最大.
Background: I would like to see the feature importances of some of the algorithms to get a feeling for which feature have the most effect on the algorithm used.
推荐答案
您可以使用pd.get_dummies()
:
pd.get_dummies(data["a"],prefix="a")
会给您:
a_0 a_1 a_2
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
可以自动生成列名.您可以将其应用于所有列,然后获取列名称.无需将它们转换为numpy矩阵.
which can automatically generates the column names. You can apply this to all your columns and then get the columns names. No need to convert them to a numpy matrix.
所以:
df = pd.DataFrame({"a": [0, 1, 2,0], "b": [0,1,4, 5], "c":[0,1,4, 5]})
data = df.as_matrix()
解决方案如下:
columns = df.columns
my_result = pd.DataFrame()
temp = pd.DataFrame()
for runner in columns:
temp = pd.get_dummies(df[runner], prefix=runner)
my_result[temp.columns] = temp
print(my_result.columns)
>>Index(['a_0', 'a_1', 'a_2', 'b_0', 'b_1', 'b_4', 'b_5', 'c_0', 'c_1', 'c_4',
'c_5'],
dtype='object')
这篇关于Python SKLearn:如何在OneHotEncoder之后获取功能名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!