本文介绍了如何匹配和合并两个值完全不同的数据框(单个单词除外)?具有10行的ABC和22550行的XYZ的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有10行的ABC和22550行的XYZ.

Have ABC with 10 rows and XYZ with 22550 rows.

值的数据框ABC:

        0                        1           2
0   sun is rising         |  UNKNOWN    | 1465465
1   micheal has arrived   |   UNKNOWN   | 324654
2   goal has been scored | UNKNOWN     | 547854

和其他XYZ值

    0         1 
0 sun       | password1
1 goal      | password2

....
....
.....
....
22550
22551  micheal   | password3

如何使用(sun,goal和micheal)ABC映射XYZ,以便用密码1替换ABC中的UNKNOWN 1

how to map XYZ with (sun,goal and micheal) ABC and so that 1 with password would replace UNKNOWN 1 in ABC

我需要的输出

    0                        1           2
0  sun is rising         |  password1    | 1465465
1   micheal has arrived  |   password3   | 324654
2   goal has been scored| password2     | 547854

在下面尝试并得到相应的错误:

tried below and getting respective errors:

d = dict(zip(XYZ[0],XYZ[1]))

pat = (r'({})'.format('|'.join(d.keys())))
ABC[1]=ABC[0].str.extract(pat,expand=False).map(d)
print(ABC)

error:TypeError:序列项16069:预期的str实例,找到了浮点数

error :TypeError: sequence item 16069: expected str instance, float found

from itertools import chain
abc.loc[:,1] = list(chain(*[xyz.loc[abc[0].str.contains(i),1] for i in xyz[0]]))

错误:IndexingError:作为索引器提供的不可对齐的布尔系列(布尔系列和被索引对象的索引不匹配

error: IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

d = dict(zip(XYZ[0], XYZ[1]))
ABC[1] = [next(d.get(y) for y in x.split() if y in d) for x in ABC[0]]
print (ABC)

错误:StopIteration:

error:StopIteration:

推荐答案

如果值不匹配,则可以获取默认参数no match:

You can get default parameter no match if not matching value:

d = dict(zip(XYZ[0].str.lower(), XYZ[1]))
ABC[1] = [next(iter(d.get(y) for y in x.lower().split() if y in d),'no match') for x in ABC[0]]

一般解决方案:

import re

XYZ = XYZ.dropna()
d = dict(zip(XYZ[0].str.lower(), XYZ[1]))
for k, v in d.items():
    ABC.loc[ABC[0].str.contains(re.escape(k), case=False, na=False), 1] = v  

这篇关于如何匹配和合并两个值完全不同的数据框(单个单词除外)?具有10行的ABC和22550行的XYZ的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 07:13