问题描述
我正在使用python自动为Qualtrics在线调查生成 qsf
文件。 qsf
文件要求使用 \u + hex
约定对Unicode字符进行转义:'слово'='\ 044u0441\u043b\u043e\u0432\u043e'。目前,我可以使用以下表达式来实现此目的:
I am using python to automatically generate qsf
files for Qualtrics online surveys. The qsf
file requires unicode characters to be escaped using the \u+hex
convention: 'слово' = '\u0441\u043b\u043e\u0432\u043e'. Currently, I am achieving this with the following expression:
'слово'.encode('ascii','backslashreplace').decode('ascii')
输出正是我需要的,但是由于这是两个-步骤过程中,我想知道是否有更有效的方法来获得相同的结果。
The output is exactly what I need, but since this is a two-step process, I wondered if there is a more efficient way to get the same result.
推荐答案
如果您将输出文件打开为'wb',然后接受字节流而不是unicode参数:
If you open your output file as 'wb', then it accepts a byte stream rather than unicode arguments:
s = 'слово'
with open('data.txt','wb') as f:
f.write(s.encode('unicode_escape'))
f.write(b'\n') # add a line feed
这似乎可以满足您的要求:
This seems to do what you want:
$ cat data.txt
\u0441\u043b\u043e\u0432\u043e
,它避免了将Unicode写入文本流时的解码以及任何翻译。
and it avoids both the decode as well as any translation that happens when writing unicode to a text stream.
已更新为使用encode('unicode_escape')作为根据@JFSebastian的建议。
Updated to use encode('unicode_escape') as per the suggestion of @J.F.Sebastian.
%timeit报告说,它比encode('ascii','backslashreplace'):快很多:
%timeit reports that it is quite a bit faster than encode('ascii', 'backslashreplace'):
In [18]: f = open('data.txt', 'wb')
In [19]: %timeit f.write(s.encode('unicode_escape'))
The slowest run took 224.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.55 µs per loop
In [20]: %timeit f.write(s.encode('ascii','backslashreplace'))
The slowest run took 9.13 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.37 µs per loop
In [21]: f.close()
奇怪的是,即使每次循环时间更快,encode('unicode_escape')从timeit的延迟也比encode('ascii','backslashreplace')的延迟长很多,因此请务必在您的环境中同时进行测试。
Curiously, the lag from timeit for encode('unicode_escape') is a lot longer than that from encode('ascii', 'backslashreplace') even though the per loop time is faster, so be sure to test both in your environment.
这篇关于制作Unicode转义码的更有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!