问题描述
我有一个熊猫DataFrame其中B包含固定大小的NumPy列表.
I am having a pandas DataFrame where B contains NumPy list of fixed size.
|------|---------------|-------|
| A | B | C |
|------|---------------|-------|
| 0 | [2,3,5,6] | X |
|------|---------------|-------|
| 1 | [1,2,3,4] | X |
|------|---------------|-------|
| 2 | [2,3,6,5] | Y |
|------|---------------|-------|
| 3 | [2,3,2,3] | Y |
|------|---------------|-------|
| 4 | [2,3,4,4] | Y |
|------|---------------|-------|
| 5 | [2,3,5,6] | Z |
|------|---------------|-------|
我想按"C"列对它们进行分组,并计算"B"值的平均值作为列表.如下表所示.我想有效地做到这一点.
I want to group these by columns 'C' and calculate the average of values of 'B' as list. As the table given below. I want to do this efficiently.
|----------------|-------|
| B | C |
|----------------|-------|
| [1.5,2.5,4,5] | X |
|----------------|-------|
| [2,3,4,4] | Y |
|----------------|-------|
| [2,3,5,6] | Z |
|----------------|-------|
我已经考虑过将NumPy列表分为几列.但这将是我的最后选择.
I have considered breaking the NumPy list into individual columns. But that would be my last option.
现在如何编写自定义聚合函数,因为B列现在显示非数字并显示
How to write a custom aggregate function as right now column B is showing non-numeric and showing
DataError: No numeric types to aggregate
推荐答案
将值转换为2d数组,然后使用np.mean
:
What you need is possible with convert values to 2d array and then using np.mean
:
f = lambda x: np.mean(np.array(x.tolist()), axis=0)
df2 = df.groupby('C')['B'].apply(f).reset_index()
print (df2)
C B
0 X [1.5, 2.5, 4.0, 5.0]
1 Y [2.0, 3.0, 4.0, 4.0]
2 Z [2.0, 3.0, 5.0, 6.0]
最后一种解决方案是可行的,但效率较低(感谢@Abhik Sarkar进行测试):
Last option solution is possible, but less effient (thank you @Abhik Sarkar for test):
df1 = pd.DataFrame(df.B.tolist()).groupby(df['C']).mean()
df2 = pd.DataFrame({'B': df1.values.tolist(), 'C': df1.index})
print (df2)
B C
0 [1.5, 2.5, 4.0, 5.0] X
1 [2.0, 3.0, 4.0, 4.0] Y
2 [2.0, 3.0, 5.0, 6.0] Z
这篇关于应用自定义groupby聚合函数以找到NumPy Array的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!