如果这是重复项,请将我链接到重复项。我还没有找到其他任何可以回答我问题的帖子。

我有一个数据框knn_res,具有以下尺寸和数据:

            username  Prediction  is_bot
0         megliebsch           1       0                                                                                1         megliebsch           1       0
2         megliebsch           1       0
3         megliebsch           1       0
4         megliebsch           1       0
...              ...         ...     ...
1220     ARTHCLAUDIA           1       1
1221     ARTHCLAUDIA           1       1                                                                                1222     ARTHCLAUDIA           1       1
1223     ARTHCLAUDIA           1       1
1224  ASSUNCAOWALLAS           1       1

[1225 rows x 3 columns]


我想为每个用户名计算prediction = 1prediction = 0的预测数,并使用这些值创建两个新列。例如,使用以下数据集:

| username | prediction | is_bot |
|:--------:|:----------:|:------:|
|    foo   |      1     |    1   |
|    foo   |      1     |    1   |
|    foo   |      1     |    1   |
|    foo   |      0     |    1   |
|    foo   |      0     |    1   |
|   foo1   |      0     |    1   |
|   foo1   |      0     |    1   |
|   foo1   |      0     |    0   |
|   foo1   |      0     |    0   |
|   foo1   |      1     |    0   |
|   foo1   |      1     |    0   |
|   foo1   |      0     |    0   |
|   foo2   |      1     |    0   |
|   foo2   |      1     |    0   |
|   foo2   |      1     |    1   |


我想要:

| username | count_bot  | count_human |
|:--------:|:----------:|:-----------:|
|    foo   |      3     |      2      |
|   foo1   |      2     |      5      |
|   foo2   |      3     |      0      |


适用以下逻辑:


  对于每一行,如果为Prediction == 1,则增加count_bot计数器。如果为Prediction == 0,则增加count_human计数器。然后,将各行的总计和分组依据附加。


到目前为止,我已经尝试引用this question并尝试了以下方法:

knn_res['count_bot'] = knn_res[knn_res.Prediction == 1].count()
print(knn_res)


产生:

            username  Prediction  is_bot  count_bot
0         megliebsch           1       0        NaN
1         megliebsch           1       0        NaN
2         megliebsch           1       0        NaN
3         megliebsch           1       0        NaN
4         megliebsch           1       0        NaN
...              ...         ...     ...        ...
1220     ARTHCLAUDIA           1       1        NaN
1221     ARTHCLAUDIA           1       1        NaN
1222     ARTHCLAUDIA           1       1        NaN
1223     ARTHCLAUDIA           1       1        NaN
1224  ASSUNCAOWALLAS           1       1        NaN


试:

new = knn_res.groupby('username').sum()
print(new)


产量:

                 Prediction  is_bot
username
666STEVEROGERS           25      25
ADELE_BROCK               1      25
ADRIANAMFTTT             24      25
AHMADRADJAB               1      25
ALBERTA_HAYNESS          24      25
ALTMANBELINDA            23      25
ALVA_MC_GHEE             25      25
ANGELITHSS               25      25
ANN1EMCCONNELL           25      25
ANWARJAMIL22             25      25
AN_N_GASTON              25      25
ARONHOLDEN8              25      25
ARTHCLAUDIA              25      25
ASSUNCAOWALLAS            1       1
BECCYWILL                 9      25
BELOZEROVNIKIT           17      25
BEN_SAR_GENT              1      25
BERT_HENLEY              24      25
BISHOLORINE              25      25
BLACKERTHEBERR5          11      25
BLACKTIVISTSUS            7      25
BLACK_ELEVATION          24      25
BOGDANOVAO2               7      25
BREMENBOTE               10      25
B_stever96                1       0
CALIFRONIAREP            24      25
C_dos_94                 25      24
Cassidygirly             25       0
ChuckSpeaks_             25       0
Cyabooty                  0       0
DurkinSays                1       0
LSU_studyabroad          24       0
MisMonWEXP                0       0
NextLevel_Mel            25       0
PeterDuca                24       0
ShellMarcel              25       0
Sir_Fried_Alott          25       0
XavierRivera_             0       0
ZacharyFlair              0       0
brentvarney44             1       0
cbars68                   0       0
chloeschultz11           25       0
hoang_le_96               1       0
kdougherty178            25       0
lasallephilo              0       0
lovely_cunt_              1       0
megliebsch               24       0
msimps_15                24       0
okweightlossdna          24       0
tankthe_hank             24       0


为了达到理想的结果,我在做错什么?

最佳答案

usernameprediction进行分组,以将列usernameprediction的相同值分组。对于每个prediction 0prediction 1username将分为不同的组。在每个组上调用count(注意:由于需要,我从is_bot更改为prediction之前的count)。最后,unstack01放在列中,然后根据需要rename

df_out = (df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0).
             rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

Out[30]:
prediction  count_human  count_bot
username
foo                   2          3
foo1                  5          2
foo2                  0          3




一步步:

按每组usernameprediction分组,并依靠每组0,每个1username

df.groupby(['username', 'prediction']).prediction.count()

Out[32]:
username  prediction
foo       0             2
          1             3
foo1      0             5
          1             2
foo2      1             3
Name: prediction, dtype: int64


拆栈以将索引prediction放入列

df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0)

Out[33]:
prediction  0  1
username
foo         2  3
foo1        5  2
foo2        0  3


最后,重命名列以匹配所需的输出

(df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0).
    rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

Out[34]:
prediction  count_human  count_bot
username
foo                   2          3
foo1                  5          2
foo2                  0          3

09-28 03:36