本文介绍了somone可以解释unicodedata.normalize(form,unistr)如何使用例子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以在API文档中,。它表示

So on the API doc, http://docs.python.org/2/library/unicodedata.html#unicodedata.normalize. It says

模糊,有人可以用一些例子来解释有效值

The documentation is rather vague, can someone explain the valid values with some examples?

推荐答案

我发现文档很清楚,但这里有一些代码示例:

I find the documentation pretty clear, but here are a few code examples:

from unicodedata import normalize

print '%r' % normalize('NFD', u'\u00C7')  # decompose: convert Ç to "C + ̧"
print '%r' % normalize('NFC', u'C\u0327') # compose: convert "C + ̧" to Ç

'D' =分解)表单将单个组合字符(如ä)转换为两个字符( a +两个点)。 'C'(= compose)表单相反。

Both 'D' (=decompose) forms convert a single combined character (like ä) into two characters (a + two dots). Both 'C' (=compose) forms do the reverse.

两个K表单用于转换添加到Unicode中的字符以实现兼容性。例如,为了支持不能在符号周围绘制圆圈的软件,有一组圆圈数字,如①(unicode号2460)。当我们应用规范分解(NFD)时,它不会做任何事情:

The two "K" forms are used to convert characters added to Unicode for compatibility purposes. For example, to support software that cannot draw circles around symbols, there is a set of "circled numbers", like ① (unicode number 2460). When we apply the canonical decomposition (NFD) to it, it doesn't do anything:

print '%r' % normalize('NFD', u'\u2460')     # u'\u2460'

然而,兼容性分解(NFKD)将返回相应的兼容字符:

However, the compatibility decomposition (NFKD) will return the corresponding "compatible" character:

print '%r' % normalize('NFKD', u'\u2460')    # 1

请参阅了解更多详情。

这篇关于somone可以解释unicodedata.normalize(form,unistr)如何使用例子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:50