问题描述
HI,这是我以前的问题之一的后续操作
HI This is a follow up from one of my previous questions how do I perform a vlookup equivalent operation on my dataframe with some additional conditions
在另一个问题中,我的第一个数据帧是
As in the other question, my first dataframe is
list = ['Computer', 'AA', 'Monitor', 'BB', 'Printer1', 'BB', 'Desk', 'AA', 'Printer2', 'DD', 'Desk', 'BB']
list2 = [1500, 232, 300, 2323, 150, 2323, 250, 2323, 23, 34, 45, 56]
df = pd.DataFrame(list,columns=['product'])
df['number'] = list2
如果我的第二个数据帧具有多个"AA"值,如下所示
and what if my 2nd dataframe has multiple values for say 'AA' as shown below
list_n = ['AA','AA','BB','BB','CC','DD']
list_n2 = ['Y','N','N','Y','N','Y']
df2 = pd.DataFrame(list_n,columns=['product'])
df2['to_add'] = list_n2
这就是它的样子
product to_add
0 AA Y
1 AA N
2 BB N
3 BB Y
4 CC N
5 DD Y
当我执行pd.merge(df, df2, on="product", how="left")
我明白了
product number to_add
0 Computer 1500 NaN
1 AA 232 Y
2 AA 232 N
3 Monitor 300 NaN
4 BB 2323 N
5 BB 2323 Y
6 Printer1 150 NaN
7 BB 2323 N
8 BB 2323 Y
9 Desk 250 NaN
10 AA 2323 Y
11 AA 2323 N
12 Printer2 23 NaN
13 DD 34 Y
14 Desk 45 NaN
15 BB 56 N
16 BB 56 Y
您现在可以看到,AA和BB有多个行.我只希望将"AA"(和"BB")的第一个值(或其中一个值)插入(当然不更改数据帧的顺序).简而言之,不要多行.只是为了澄清一下,我的df2有超过6000行,而且我不知道哪些条目重复.
As you can see now there are multiple rows for AA and BB. I just want the first value (or one of the values) for 'AA' (and 'BB') to be pull across (without altering the sequence of the dataframe of course). In short don't want multiple rows. just to clarify, my df2 has over 6000 rows and I don't know which entries are duplicated.
所以答案应该看起来像是一行
so the answer should look something line
product number to_add
0 Computer 1500 NaN
1 AA 232 Y
2 Monitor 300 NaN
3 BB 2323 N
4 Printer1 150 NaN
5 BB 2323 N
6 Desk 250 NaN
7 AA 2323 Y
8 Printer2 23 NaN
9 DD 34 Y
10 Desk 45 NaN
11 BB 56 N
推荐答案
使用:
df_m = pd.merge(df, df2, on="product", how="left")
m = df_m["product"].isin(df2["product"]) & df_m["product"].eq(df_m["product"].shift())
df_m = df_m[~m].reset_index(drop=True)
print(df_m)
此打印:
product number to_add
0 Computer 1500 NaN
1 AA 232 Y
2 Monitor 300 NaN
3 BB 2323 N
4 Printer1 150 NaN
5 BB 2323 N
6 Desk 250 NaN
7 AA 2323 Y
8 Printer2 23 NaN
9 DD 34 N
10 Desk 45 NaN
11 BB 56 N
这篇关于当我不想重复值时,在python中执行合并功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!