本文介绍了 pandas groupby结合sklearn预处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按特定列对DataFrame进行分组,然后应用sklearn预处理MinMaxScaler并存储缩放器对象.

I want to group my DataFrame by specific column and then apply a sklearn preprocessing MinMaxScaler and store the scaler object.

我现在的起点:

import pandas as pd
from sklearn import preprocessing

scaler = {}
groups = df.groupby('ID')

for name, group in groups:
  scr = preprocessing.MinMaxScaler()
  scr.fit(group)
  scaler.update({name: scr})
  group = scr.transform(group)

使用df.groupby('ID').transform可以吗?

更新

来自我的原始DataFrame

From my original DataFrame

pd.DataFrame( dict( ID=list('AAABBB'),
                    VL=(0,10,10,100,100,200))

我想根据ID缩放所有列.在此示例中:

I want to scale all columns based on ID. In this example:

   A 0.0
   A 1.0
   A 1.0
   B 0.0
   B 0.0
   B 1.0

带有信息/缩放器对象(已通过fit初始化)

with the information / scaler object (initialized with fit)

preprocessing.MinMaxScaler().fit( ... )

推荐答案

您可以从一个方向进行操作:

you can do it in one direction:

In [62]: from sklearn.preprocessing import minmax_scale

In [63]: df
Out[63]:
  ID   VL  SC
0  A    0   0
1  A   10   1
2  A   10   1
3  B  100   0
4  B  100   0
5  B  200   1

In [64]: df['SC'] = df.groupby('ID').VL.transform(lambda x: minmax_scale(x.astype(float)))

In [65]: df
Out[65]:
  ID   VL  SC
0  A    0   0
1  A   10   1
2  A   10   1
3  B  100   0
4  B  100   0
5  B  200   1

,但您不会厌烦使用inverse_transform,因为每次调用MinMaxScaler(针对每个组或每个ID)都会覆盖有关原始功能的信息...

but you will not be anle to use inverse_transform as each call of MinMaxScaler (for each group or each ID) will overwrite the information about your orginal features...

这篇关于 pandas groupby结合sklearn预处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 19:55