本文介绍了制作Unicode转义码的更有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python自动为Qualtrics在线调查生成 qsf 文件。 qsf 文件要求使用 \u + hex 约定对Unicode字符进行转义:'слово'='\ 044u0441\u043b\u043e\u0432\u043e'。目前,我可以使用以下表达式来实现此目的:

I am using python to automatically generate qsf files for Qualtrics online surveys. The qsf file requires unicode characters to be escaped using the \u+hex convention: 'слово' = '\u0441\u043b\u043e\u0432\u043e'. Currently, I am achieving this with the following expression:

'слово'.encode('ascii','backslashreplace').decode('ascii')

输出正是我需要的,但是由于这是两个-步骤过程中,我想知道是否有更有效的方法来获得相同的结果。

The output is exactly what I need, but since this is a two-step process, I wondered if there is a more efficient way to get the same result.

推荐答案

如果您将输出文件打开为'wb',然后接受字节流而不是unicode参数:

If you open your output file as 'wb', then it accepts a byte stream rather than unicode arguments:

s = 'слово'
with open('data.txt','wb') as f:
    f.write(s.encode('unicode_escape'))
    f.write(b'\n')  # add a line feed

这似乎可以满足您的要求:

This seems to do what you want:

$ cat data.txt
\u0441\u043b\u043e\u0432\u043e

,它避免了将Unicode写入文本流时的解码以及任何翻译。

and it avoids both the decode as well as any translation that happens when writing unicode to a text stream.

已更新为使用encode('unicode_escape')作为根据@JFSebastian的建议。

Updated to use encode('unicode_escape') as per the suggestion of @J.F.Sebastian.

%timeit报告说,它比encode('ascii','backslashreplace'):快很多:

%timeit reports that it is quite a bit faster than encode('ascii', 'backslashreplace'):

In [18]: f = open('data.txt', 'wb')

In [19]: %timeit f.write(s.encode('unicode_escape'))
The slowest run took 224.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.55 µs per loop

In [20]: %timeit f.write(s.encode('ascii','backslashreplace'))
The slowest run took 9.13 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.37 µs per loop

In [21]: f.close()

奇怪的是,即使每次循环时间更快,encode('unicode_escape')从timeit的延迟也比encode('ascii','backslashreplace')的延迟长很多,因此请务必在您的环境中同时进行测试。

Curiously, the lag from timeit for encode('unicode_escape') is a lot longer than that from encode('ascii', 'backslashreplace') even though the per loop time is faster, so be sure to test both in your environment.

这篇关于制作Unicode转义码的更有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-16 06:58