pandas版本1.5.3中groupby方法,当设置group_keys=True时,会以groupby的字段为第一级索引,如下述代码中time_id作为第一级索引,同时保留了原dataframe(df)中的索引作为第二级索引。

>>> df.groupby(['time_id'], group_keys=True)['wap'].apply(log_return)
time_id         
0        0               NaN
         1          0.000000
         2          0.000000
         3          0.000000
         4          0.000000
                      ...   
26454    5237975   -0.001228
         5237976    0.000491
         5237977   -0.005031
         5237978    0.003219
         5237979    0.003264
Name: wap, Length: 5237980, dtype: float64

group_keys的意思就是是否保留groupby的feature(如time_id)作为keys放入结果中,True是放,False是不放。这也印证了帮助里的说明:
group_keys : bool, optional
When calling apply and the by argument produces a like-indexed
(i.e. :ref:a transform <groupby.transform>) result, add group keys to
index to identify pieces
. By default group keys are not included
when the result’s index (and column) labels match the inputs, and
are included otherwise. This argument has no effect if the result produced
is not like-indexed with respect to the input.
因此,当设置group_keys=False时,group keys(time_id)就不在返回结果中了,如下所示。在设置为False是可以直接将返回结果,作为原dataframe(df)的一列,很方便。

>>> df.groupby(['time_id'], group_keys=False)['wap'].apply(log_return)
0               NaN
1          0.000000
2          0.000000
3          0.000000
4          0.000000
             ...   
5237975   -0.001228
5237976    0.000491
5237977   -0.005031
5237978    0.003219
5237979    0.003264
Name: wap, Length: 5237980, dtype: float64

PS:对英文帮助的深入理解,需要结合实际应用。

11-11 22:34