sklearn中labelEncoder的工作

本文介绍了sklearn中labelEncoder的工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我具有以下输入功能:

Say I have the following input feature:

hotel_id = [1, 2, 3, 2, 3]

这是具有数字值的分类功能.如果按原样将其提供给模型，则模型会将其视为连续变量，即2>1.

This is a categorical feature with numeric values. If I give it to the model as it is, the model will treat it as continuous variable, ie., 2 > 1.

如果我应用 sklearn.labelEncoder()，那么我会得到:

If I apply sklearn.labelEncoder() then I will get:

hotel_id = [0, 1, 2, 1, 2]

因此，此编码功能被认为是连续的还是分类的?如果将其视为连续的，那么labelEncoder()的用途是什么.

So this encoded feature is considered as continuous or categorical?If it is treated as continuous then whats the use of labelEncoder().

P.S.我知道一种热编码.但是大约有100个hotel_id，所以不想使用它.谢谢

P.S. I know about one hot encoding. But there are around 100 hotel_ids so dont want to use it. Thanks

推荐答案

LabelEncoder 是一种编码类级别的方法.除了您提供的整数示例外，请考虑以下示例:

The LabelEncoder is a way to encode class levels. In addition to the integer example you've included, consider the following example:

>>> from sklearn.preprocessing import LabelEncoder
>>> le = LabelEncoder()
>>>
>>> train = ["paris", "paris", "tokyo", "amsterdam"]
>>> test = ["tokyo", "tokyo", "paris"]
>>> le.fit(train).transform(test)
array([2, 2, 1]...)

LabelEncoder 允许我们做的是为分类数据分配序数级别.但是，您所注意到的是正确的:即，将 [2，2，1] 视为数字数据.对于将 OneHotEncoder 用于伪变量(我知道您说过您不希望使用此变量)，这是一个不错的选择.

What the LabelEncoder allows us to do, then, is to assign ordinal levels to categorical data. However, what you've noted is correct: namely, the [2, 2, 1] is treated as numeric data. This is a good candidate for using the OneHotEncoder for dummy variables (which I know you said you were hoping not to use).

请注意， LabelEncoder 必须在一次热编码之前使用，因为 OneHotEncoder 无法处理分类数据.因此，它经常被用作单热编码的前兆.

Note that the LabelEncoder must be used prior to one-hot encoding, as the OneHotEncoder cannot handle categorical data. Therefore, it is frequently used as pre-cursor to one-hot encoding.

或者，它可以将目标编码为可用数组.例如，如果 train 是分类的目标，则需要一个 LabelEncoder 并将其用作您的y变量.

Alternatively, it can encode your target into a usable array. If, for instance, train were your target for classification, you would need a LabelEncoder to use it as your y variable.

这篇关于sklearn中labelEncoder的工作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！