python - dict理解不符合预期

def tf(tokens):
    """ Compute TF
    Args:
        tokens (list of str): input list of tokens from tokenize
    Returns:
        dictionary: a dictionary of tokens to its TF values
    """
    li = {}
    total = len(tokens)
    li = {token: 1 if not token in li else li[token] + 1 for token in tokens }
    return {token: li[token]/ float(total) for token in li}

基本上，我想要一个字典，其中令牌是键，值是令牌在令牌列表中的出现频率。

我想让我的理解力检查令牌是否已经存在。如果只是将其值增加1，否则将其创建，并将其值设置为1。

出于某种原因，每个密钥以（1）结束，无论它在令牌列表中出现多少次。

您能帮我看看为什么会这样吗？

我可以用一个循环来解决它，但是我想掌握字典的理解。

非常感谢！

最佳答案

诸如列表/字典理解之类的理解表达式是一个构建器表达式，只有在该表达式被完全求值后才构造该对象。随后，将符号名称分配给所生成字典的引用。

在您的特定示例中，您指的是符号li，它表示对象空字典。因此，在表达式的求值期间，li继续引用空字典，这意味着字典理解可以等效地写为

li = {token: 1 if not token in {} else l{}[token] + 1 for token in tokens }

或简化为对空字典的成员资格测试始终为假

li = {token: 1  for token in tokens }

您需要的是已经可用的库实用程序或基于状态的解决方案。

幸运的是，标准库collections提供了一个称为counter的函数，该函数是为此目的而编写和设计的

这将只是您的功能

def tf(tokens):
    from collections import Counter
    """ Compute TF
    Args:
        tokens (list of str): input list of tokens from tokenize
    Returns:
        dictionary: a dictionary of tokens to its TF values
    """
    return Counter(tokens)

基于状态的解决方案仅需要为每个唯一的事件使用一个外部计数器

def tf(tokens):
    from collections import defaultdict
    """ Compute TF
    Args:
        tokens (list of str): input list of tokens from tokenize
    Returns:
        dictionary: a dictionary of tokens to its TF values
    """
    counter = defaultdict(int)
    for token in tokens:
          counter[token] += 1
    return counter

或者如果您不打算使用defaultdict

def tf(tokens):
    from collections import defaultdict
    """ Compute TF
    Args:
        tokens (list of str): input list of tokens from tokenize
    Returns:
        dictionary: a dictionary of tokens to its TF values
    """
    counter = {}
    for token in tokens:
          counter[token] = counter.get(token, 0) + 1
    return counter

关于python - dict理解不符合预期，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/30964404/