本文介绍了unicodedata.normalize(form,unistr)如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在API文档上, http://docs.python. org/2/library/unicodedata.html#unicodedata.normalize .它说

On the API doc, http://docs.python.org/2/library/unicodedata.html#unicodedata.normalize. It says

文档含糊不清,有人可以用一些示例解释valid values吗?

The documentation is rather vague, can someone explain the valid values with some examples?

推荐答案

我发现文档非常清晰,但是这里有一些代码示例:

I find the documentation pretty clear, but here are a few code examples:

from unicodedata import normalize

print '%r' % normalize('NFD', u'\u00C7')  # decompose: convert Ç to "C + ̧"
print '%r' % normalize('NFC', u'C\u0327') # compose: convert "C + ̧" to Ç

两个'D'(=分解)形式都将单个组合字符(如ä)转换为两个字符(a +两个点).两种'C'(= compose)形式都相反.

Both 'D' (=decompose) forms convert a single combined character (like ä) into two characters (a + two dots). Both 'C' (=compose) forms do the reverse.

两个"K"形式用于转换添加到Unicode的字符,以实现兼容性.例如,为了支持不能在符号周围画圆的软件,有一组带圆圈的数字",例如①(统一编号2460).当我们对其应用规范分解(NFD)时,它无能为力:

The two "K" forms are used to convert characters added to Unicode for compatibility purposes. For example, to support software that cannot draw circles around symbols, there is a set of "circled numbers", like ① (unicode number 2460). When we apply the canonical decomposition (NFD) to it, it doesn't do anything:

print '%r' % normalize('NFD', u'\u2460')     # u'\u2460'

但是,兼容性分解(NFKD)将返回相应的兼容"字符:

However, the compatibility decomposition (NFKD) will return the corresponding "compatible" character:

print '%r' % normalize('NFKD', u'\u2460')    # 1

有关更多详细信息,请参见 http://en.wikipedia.org/wiki/Unicode_equivalence .

See http://en.wikipedia.org/wiki/Unicode_equivalence for more details.

这篇关于unicodedata.normalize(form,unistr)如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:47