正如你们所看到的,由于某些关键点不匹配,我们失去了两帧的值。我要找的是注意左框架和右框架的不匹配条目的数量。我不知道怎么做。
左框架

   key left_value
0    0          a
1    1          b
2    2          c
3    3          d
4    4          e

右框架
   key right_value
0    2           f
1    3           g
2    4           h
3    5           i
4    6           j

pd.merge(left_frame, right_frame, on='key', how='inner')

**期望输出:1**
    key  left_value right_value
0   2    c           f
1   3    d           g
2   4    e           h

**期望输出:2**
   key left_value right_value      _merge
0    0          a         NaN   left_only
1    1          b         NaN   left_only
5    5        NaN           i  right_only
6    6        NaN           j  right_only

所以基本上,我想要两个数据帧,一个用于“内部”,另一个用于不匹配

最佳答案

如果将合并类型更改为“outer”并传递indicator=True,则可以看到不匹配的行来自何处:

In [193]:
pd.merge(left, right, how='outer', indicator=True)

Out[193]:
   key left_value right_value      _merge
0    0          a         NaN   left_only
1    1          b         NaN   left_only
2    2          c           f        both
3    3          d           g        both
4    4          e           h        both
5    5        NaN           i  right_only
6    6        NaN           j  right_only

您可以在此列上groupby并调用count
In [194]:
pd.merge(left, right, how='outer', indicator=True).groupby('_merge').count()

Out[194]:
            key  left_value  right_value
_merge
left_only     2           2            0
right_only    2           0            2
both          3           3            3

如果要筛选并保存结果:
In [198]:
merged = pd.merge(left, right, how='outer', indicator=True)
merged

Out[198]:
   key left_value right_value      _merge
0    0          a         NaN   left_only
1    1          b         NaN   left_only
2    2          c           f        both
3    3          d           g        both
4    4          e           h        both
5    5        NaN           i  right_only
6    6        NaN           j  right_only

In [199]:
both = merged[merged['_merge'] == 'both']
both

Out[199]:
   key left_value right_value _merge
2    2          c           f   both
3    3          d           g   both
4    4          e           h   both

In [200]:
other = merged[merged['_merge'] != 'both']
other

Out[200]:
   key left_value right_value      _merge
0    0          a         NaN   left_only
1    1          b         NaN   left_only
5    5        NaN           i  right_only
6    6        NaN           j  right_only

关于python - 使用Python Pandas记录(保存)数据集_a和数据集_b的不匹配条目,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/35800790/

10-12 18:57