Python group by

本文介绍了Python group by的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一组数据对，其中 index 0 是值，index 1 是类型:

input = [('11013331', 'KAT'),('9085267', '不是'),('5238761', 'ETH'),('5349618', 'ETH'),('11788544', '不是'),('962142', 'ETH'),('7795297', 'ETH'),('7341464', 'ETH'),('9843236', 'KAT'),('5594916', 'ETH'),('1550003', 'ETH')]

我想按它们的类型(按第一个索引字符串)对它们进行分组:

result = [{类型:'KAT',项目:['11013331'，'9843236']},{类型:'不'，项目:['9085267'，'11788544']},{类型:'ETH'，项目:['5238761'、'962142'、'7795297'、'7341464'、'5594916'、'1550003']}]

我怎样才能以有效的方式实现这一目标?

解决方案

分两步完成.首先，创建一个字典.

>>>输入 = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT''), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH')'), ('1550003', 'ETH')]>>>从集合导入 defaultdict>>>res = defaultdict(列表)>>>对于 v, k 输入: res[k].append(v)...

然后，将该字典转换为预期的格式.

>>>[{'type':k, 'items':v} for k,v in res.items()][{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '4'731, '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]

itertools.groupby 也可以，但它需要先对输入进行排序.

>>>sorted_input = sorted(input, key=itemgetter(1))>>>组 = groupby(sorted_input, key=itemgetter(1))>>>[{'type':k, 'items':[x[0] for x in v]} for k, v in group][{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'item['11013331'，'9843236']，'type':'KAT'}，{'items':['9085267'，'11788544']，'type':'NOT'}]

注意这两个都不尊重键的原始顺序.如果您需要保留订单，则需要 OrderedDict.

>>>从集合导入 OrderedDict>>>res = OrderedDict()>>>对于 v, k 输入:...如果 k 在 res: res[k].append(v)...否则: res[k] = [v]...>>>[{'type':k, 'items':v} for k,v in res.items()][{'items':['11013331'，'9843236']，'type':'KAT'}，{'items':['9085267'，'11788544']，'type':'NOT'}，{'项目':['5238761'，'5349618'，'962142'，'7795297'，'7341464'，'5594916'，'1550003']，'类型':'ETH'}]

Assume that I have a set of data pair where index 0 is the value and index 1 is the type:

input = [
          ('11013331', 'KAT'), 
          ('9085267',  'NOT'), 
          ('5238761',  'ETH'), 
          ('5349618',  'ETH'), 
          ('11788544', 'NOT'), 
          ('962142',   'ETH'), 
          ('7795297',  'ETH'), 
          ('7341464',  'ETH'), 
          ('9843236',  'KAT'), 
          ('5594916',  'ETH'), 
          ('1550003',  'ETH')
        ]

I want to group them by their type (by the 1st indexed string) as such:

result = [ 
           { 
             type:'KAT', 
             items: ['11013331', '9843236'] 
           },
           {
             type:'NOT', 
             items: ['9085267', '11788544'] 
           },
           {
             type:'ETH', 
             items: ['5238761', '962142', '7795297', '7341464', '5594916', '1550003'] 
           }
         ]

How can I achieve this in an efficient way?

解决方案

Do it in 2 steps. First, create a dictionary.

>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
...

Then, convert that dictionary into the expected format.

>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]

It is also possible with itertools.groupby but it requires the input to be sorted first.

>>> sorted_input = sorted(input, key=itemgetter(1))
>>> groups = groupby(sorted_input, key=itemgetter(1))
>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups]
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]

Note both of these do not respect the original order of the keys. You need an OrderedDict if you need to keep the order.

>>> from collections import OrderedDict
>>> res = OrderedDict()
>>> for v, k in input:
...   if k in res: res[k].append(v)
...   else: res[k] = [v]
... 
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]

这篇关于Python group by的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！