如何从sklearn反转Label Encoder多列？

本文介绍了如何从sklearn反转Label Encoder多列？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在多列上对LabelEncoder使用inverse_transform函数。

I would like to use the inverse_transform function for LabelEncoder on multiple columns.

当在数据帧上应用LabelEncoder时，这是我用于多个列的代码：

This is the code I use for more than one columns when applying LabelEncoder on a dataframe:

class MultiColumnLabelEncoder:
    def __init__(self,columns = None):
        self.columns = columns # array of column names to encode

    def fit(self,X,y=None):
        return self # not relevant here

    def transform(self,X):
        '''
        Transforms columns of X specified in self.columns using
        LabelEncoder(). If no columns specified, transforms all
        columns in X.
        '''
        output = X.copy()
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname,col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)
        return output

    def fit_transform(self,X,y=None):
        return self.fit(X,y).transform(X)

有没有办法修改代码并对其进行更改，以便

Is there a way to modify the code and change it so that it be used to inverse the labels from the encoder?

谢谢

推荐答案

为了对数据进行逆变换，您需要记住用于变换每一列的编码器。一种可能的方法是将 LabelEncoder 保存到对象内部的dict中。它的工作方式：

In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoders in a dict inside your object. The way it would work:

调用 fit 每列的编码器时合适并保存

当您调用转换时，它们习惯于转换数据

您调用 inverse_transform 他们习惯于进行逆变换

when you call fit the encoders for every column are fit and saved
when you call transform they get used to transform data
when you call inverse_transform they get used to do the inverse transformation

示例代码：

class MultiColumnLabelEncoder:

    def __init__(self, columns=None):
        self.columns = columns # array of column names to encode


    def fit(self, X, y=None):
        self.encoders = {}
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            self.encoders[col] = LabelEncoder().fit(X[col])
        return self


    def transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].transform(X[col])
        return output


    def fit_transform(self, X, y=None):
        return self.fit(X,y).transform(X)


    def inverse_transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].inverse_transform(X[col])
        return output

然后可以像这样使用它：

You can then use it like this:

multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city':    ['London','Paris','Moscow'],
                   'size':    ['M',     'M',    'L'],
                   'quantity':[12,       1,      4]})
X = multi.fit_transform(df)
print(X)
#    city  size  quantity
# 0     0     1        12
# 1     2     1         1
# 2     1     0         4
inv = multi.inverse_transform(X)
print(inv)
#      city size  quantity
# 0  London    M        12
# 1   Paris    M         1
# 2  Moscow    L         4

可以单独实现 fit_transform 将调用与 LabelEncoder s相同的方法。只需确保在需要逆变换时将编码器摆在附近即可。

There could be a separate implementation of fit_transform that would call the same method of LabelEncoders. Just make sure to keep the encoders around for when you need the inverse transformation.

这篇关于如何从sklearn反转Label Encoder多列？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！