本文介绍了当我不想重复值时,在python中执行合并功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HI,这是我以前的问题之一的后续操作

HI This is a follow up from one of my previous questions how do I perform a vlookup equivalent operation on my dataframe with some additional conditions

在另一个问题中,我的第一个数据帧是

As in the other question, my first dataframe is

list = ['Computer', 'AA', 'Monitor', 'BB', 'Printer1', 'BB', 'Desk', 'AA', 'Printer2', 'DD', 'Desk', 'BB']
list2 = [1500, 232, 300, 2323, 150, 2323, 250, 2323, 23, 34, 45, 56]
df = pd.DataFrame(list,columns=['product'])
df['number'] = list2

如果我的第二个数据帧具有多个"AA"值,如下所示

and what if my 2nd dataframe has multiple values for say 'AA' as shown below

list_n = ['AA','AA','BB','BB','CC','DD']
list_n2 = ['Y','N','N','Y','N','Y']

df2 = pd.DataFrame(list_n,columns=['product'])
df2['to_add'] = list_n2

这就是它的样子

  product to_add
0      AA      Y
1      AA      N
2      BB      N
3      BB      Y
4      CC      N
5      DD      Y

当我执行pd.merge(df, df2, on="product", how="left")我明白了

 product  number to_add
0   Computer    1500    NaN
1         AA     232      Y
2         AA     232      N
3    Monitor     300    NaN
4         BB    2323      N
5         BB    2323      Y
6    Printer1     150    NaN
7         BB    2323      N
8         BB    2323      Y
9       Desk     250    NaN
10        AA    2323      Y
11        AA    2323      N
12   Printer2      23    NaN
13        DD      34      Y
14      Desk      45    NaN
15        BB      56      N
16        BB      56      Y

您现在可以看到,AA和BB有多个行.我只希望将"AA"(和"BB")的第一个值(或其中一个值)插入(当然不更改数据帧的顺序).简而言之,不要多行.只是为了澄清一下,我的df2有超过6000行,而且我不知道哪些条目重复.

As you can see now there are multiple rows for AA and BB. I just want the first value (or one of the values) for 'AA' (and 'BB') to be pull across (without altering the sequence of the dataframe of course). In short don't want multiple rows. just to clarify, my df2 has over 6000 rows and I don't know which entries are duplicated.

所以答案应该看起来像是一行

so the answer should look something line

     product  number to_add
0   Computer    1500    NaN
1         AA     232      Y
2    Monitor     300    NaN
3         BB    2323      N
4    Printer1     150    NaN
5         BB    2323      N
6       Desk     250    NaN
7         AA    2323      Y
8    Printer2      23    NaN
9         DD      34      Y
10      Desk      45    NaN
11        BB      56      N

推荐答案

使用:

df_m = pd.merge(df, df2, on="product", how="left")

m = df_m["product"].isin(df2["product"]) & df_m["product"].eq(df_m["product"].shift())
df_m = df_m[~m].reset_index(drop=True)
print(df_m)

此打印:

     product  number to_add
0   Computer    1500    NaN
1         AA     232      Y
2    Monitor     300    NaN
3         BB    2323      N
4   Printer1     150    NaN
5         BB    2323      N
6       Desk     250    NaN
7         AA    2323      Y
8   Printer2      23    NaN
9         DD      34      N
10      Desk      45    NaN
11        BB      56      N

这篇关于当我不想重复值时,在python中执行合并功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 03:13