本文介绍了如何从sklearn反转Label Encoder多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在多列上对LabelEncoder使用inverse_transform函数。

I would like to use the inverse_transform function for LabelEncoder on multiple columns.

当在数据帧上应用LabelEncoder时,这是我用于多个列的代码:

This is the code I use for more than one columns when applying LabelEncoder on a dataframe:

class MultiColumnLabelEncoder:
    def __init__(self,columns = None):
        self.columns = columns # array of column names to encode

    def fit(self,X,y=None):
        return self # not relevant here

    def transform(self,X):
        '''
        Transforms columns of X specified in self.columns using
        LabelEncoder(). If no columns specified, transforms all
        columns in X.
        '''
        output = X.copy()
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname,col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)
        return output

    def fit_transform(self,X,y=None):
        return self.fit(X,y).transform(X)

有没有办法修改代码并对其进行更改,以便

Is there a way to modify the code and change it so that it be used to inverse the labels from the encoder?

谢谢

推荐答案

为了对数据进行逆变换,您需要记住用于变换每一列的编码器。一种可能的方法是将 LabelEncoder 保存到对象内部的dict中。它的工作方式:

In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoders in a dict inside your object. The way it would work:


  • 调用 fit 每列的编码器时合适并保存

  • 当您调用转换时,它们习惯于转换数据

  • 您调用 inverse_transform 他们习惯于进行逆变换

  • when you call fit the encoders for every column are fit and saved
  • when you call transform they get used to transform data
  • when you call inverse_transform they get used to do the inverse transformation

示例代码:

class MultiColumnLabelEncoder:

    def __init__(self, columns=None):
        self.columns = columns # array of column names to encode


    def fit(self, X, y=None):
        self.encoders = {}
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            self.encoders[col] = LabelEncoder().fit(X[col])
        return self


    def transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].transform(X[col])
        return output


    def fit_transform(self, X, y=None):
        return self.fit(X,y).transform(X)


    def inverse_transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].inverse_transform(X[col])
        return output

然后可以像这样使用它:

You can then use it like this:

multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city':    ['London','Paris','Moscow'],
                   'size':    ['M',     'M',    'L'],
                   'quantity':[12,       1,      4]})
X = multi.fit_transform(df)
print(X)
#    city  size  quantity
# 0     0     1        12
# 1     2     1         1
# 2     1     0         4
inv = multi.inverse_transform(X)
print(inv)
#      city size  quantity
# 0  London    M        12
# 1   Paris    M         1
# 2  Moscow    L         4

可以单独实现 fit_transform 将调用与 LabelEncoder s相同的方法。只需确保在需要逆变换时将编码器摆在附近即可。

There could be a separate implementation of fit_transform that would call the same method of LabelEncoders. Just make sure to keep the encoders around for when you need the inverse transformation.

这篇关于如何从sklearn反转Label Encoder多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 15:09