问题描述
我想在多列上对LabelEncoder使用inverse_transform函数。
I would like to use the inverse_transform function for LabelEncoder on multiple columns.
当在数据帧上应用LabelEncoder时,这是我用于多个列的代码:
This is the code I use for more than one columns when applying LabelEncoder on a dataframe:
class MultiColumnLabelEncoder:
def __init__(self,columns = None):
self.columns = columns # array of column names to encode
def fit(self,X,y=None):
return self # not relevant here
def transform(self,X):
'''
Transforms columns of X specified in self.columns using
LabelEncoder(). If no columns specified, transforms all
columns in X.
'''
output = X.copy()
if self.columns is not None:
for col in self.columns:
output[col] = LabelEncoder().fit_transform(output[col])
else:
for colname,col in output.iteritems():
output[colname] = LabelEncoder().fit_transform(col)
return output
def fit_transform(self,X,y=None):
return self.fit(X,y).transform(X)
有没有办法修改代码并对其进行更改,以便
Is there a way to modify the code and change it so that it be used to inverse the labels from the encoder?
谢谢
推荐答案
为了对数据进行逆变换,您需要记住用于变换每一列的编码器。一种可能的方法是将 LabelEncoder
保存到对象内部的dict中。它的工作方式:
In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoder
s in a dict inside your object. The way it would work:
- 调用
fit
每列的编码器时合适并保存 - 当您调用
转换
时,它们习惯于转换数据 - 您调用
inverse_transform
他们习惯于进行逆变换
- when you call
fit
the encoders for every column are fit and saved - when you call
transform
they get used to transform data - when you call
inverse_transform
they get used to do the inverse transformation
示例代码:
class MultiColumnLabelEncoder:
def __init__(self, columns=None):
self.columns = columns # array of column names to encode
def fit(self, X, y=None):
self.encoders = {}
columns = X.columns if self.columns is None else self.columns
for col in columns:
self.encoders[col] = LabelEncoder().fit(X[col])
return self
def transform(self, X):
output = X.copy()
columns = X.columns if self.columns is None else self.columns
for col in columns:
output[col] = self.encoders[col].transform(X[col])
return output
def fit_transform(self, X, y=None):
return self.fit(X,y).transform(X)
def inverse_transform(self, X):
output = X.copy()
columns = X.columns if self.columns is None else self.columns
for col in columns:
output[col] = self.encoders[col].inverse_transform(X[col])
return output
然后可以像这样使用它:
You can then use it like this:
multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city': ['London','Paris','Moscow'],
'size': ['M', 'M', 'L'],
'quantity':[12, 1, 4]})
X = multi.fit_transform(df)
print(X)
# city size quantity
# 0 0 1 12
# 1 2 1 1
# 2 1 0 4
inv = multi.inverse_transform(X)
print(inv)
# city size quantity
# 0 London M 12
# 1 Paris M 1
# 2 Moscow L 4
可以单独实现 fit_transform
将调用与 LabelEncoder
s相同的方法。只需确保在需要逆变换时将编码器摆在附近即可。
There could be a separate implementation of fit_transform
that would call the same method of LabelEncoder
s. Just make sure to keep the encoders around for when you need the inverse transformation.
这篇关于如何从sklearn反转Label Encoder多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!