本文介绍了应用自定义groupby聚合函数以找到NumPy Array的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫DataFrame其中B包含固定大小的NumPy列表.

I am having a pandas DataFrame where B contains NumPy list of fixed size.

|------|---------------|-------|
|  A   |       B       |   C   |
|------|---------------|-------|
|  0   |   [2,3,5,6]   |   X   |
|------|---------------|-------|
|  1   |   [1,2,3,4]   |   X   |
|------|---------------|-------|
|  2   |   [2,3,6,5]   |   Y   |
|------|---------------|-------|
|  3   |   [2,3,2,3]   |   Y   |
|------|---------------|-------|
|  4   |   [2,3,4,4]   |   Y   |
|------|---------------|-------|
|  5   |   [2,3,5,6]   |   Z   |
|------|---------------|-------|

我想按"C"列对它们进行分组,并计算"B"值的平均值作为列表.如下表所示.我想有效地做到这一点.

I want to group these by columns 'C' and calculate the average of values of 'B' as list. As the table given below. I want to do this efficiently.

|----------------|-------|
|        B       |   C   |
|----------------|-------|
|  [1.5,2.5,4,5] |   X   |
|----------------|-------|
|    [2,3,4,4]   |   Y   |
|----------------|-------|
|    [2,3,5,6]   |   Z   |
|----------------|-------|

我已经考虑过将NumPy列表分为几列.但这将是我的最后选择.

I have considered breaking the NumPy list into individual columns. But that would be my last option.

现在如何编写自定义聚合函数,因为B列现在显示非数字并显示

How to write a custom aggregate function as right now column B is showing non-numeric and showing

DataError: No numeric types to aggregate 

推荐答案

将值转换为2d数组,然后使用np.mean:

What you need is possible with convert values to 2d array and then using np.mean:

f = lambda x: np.mean(np.array(x.tolist()), axis=0)
df2 = df.groupby('C')['B'].apply(f).reset_index()
print (df2)
   C                     B
0  X  [1.5, 2.5, 4.0, 5.0]
1  Y  [2.0, 3.0, 4.0, 4.0]
2  Z  [2.0, 3.0, 5.0, 6.0]

最后一种解决方案是可行的,但效率较低(感谢@Abhik Sarkar进行测试):

Last option solution is possible, but less effient (thank you @Abhik Sarkar for test):

df1 = pd.DataFrame(df.B.tolist()).groupby(df['C']).mean()
df2 = pd.DataFrame({'B': df1.values.tolist(), 'C': df1.index})
print (df2)
                      B  C
0  [1.5, 2.5, 4.0, 5.0]  X
1  [2.0, 3.0, 4.0, 4.0]  Y
2  [2.0, 3.0, 5.0, 6.0]  Z

这篇关于应用自定义groupby聚合函数以找到NumPy Array的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 16:52