制作Unicode转义码的更有效方法

本文介绍了制作Unicode转义码的更有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用python自动为Qualtrics在线调查生成 qsf 文件。 qsf 文件要求使用 \u + hex 约定对Unicode字符进行转义：'слово'='＼ 044u0441\u043b\u043e\u0432\u043e'。目前，我可以使用以下表达式来实现此目的：

I am using python to automatically generate qsf files for Qualtrics online surveys. The qsf file requires unicode characters to be escaped using the \u+hex convention: 'слово' = '\u0441\u043b\u043e\u0432\u043e'. Currently, I am achieving this with the following expression:

'слово'.encode('ascii','backslashreplace').decode('ascii')

输出正是我需要的，但是由于这是两个-步骤过程中，我想知道是否有更有效的方法来获得相同的结果。

The output is exactly what I need, but since this is a two-step process, I wondered if there is a more efficient way to get the same result.

推荐答案

如果您将输出文件打开为'wb'，然后接受字节流而不是unicode参数：

If you open your output file as 'wb', then it accepts a byte stream rather than unicode arguments:

s = 'слово'
with open('data.txt','wb') as f:
    f.write(s.encode('unicode_escape'))
    f.write(b'\n')  # add a line feed

这似乎可以满足您的要求：

This seems to do what you want:

$ cat data.txt
\u0441\u043b\u043e\u0432\u043e

，它避免了将Unicode写入文本流时的解码以及任何翻译。

and it avoids both the decode as well as any translation that happens when writing unicode to a text stream.

已更新为使用encode（'unicode_escape'）作为根据@JFSebastian的建议。

Updated to use encode('unicode_escape') as per the suggestion of @J.F.Sebastian.

％timeit报告说，它比encode（'ascii'，'backslashreplace'）：快很多：

%timeit reports that it is quite a bit faster than encode('ascii', 'backslashreplace'):

In [18]: f = open('data.txt', 'wb')

In [19]: %timeit f.write(s.encode('unicode_escape'))
The slowest run took 224.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.55 µs per loop

In [20]: %timeit f.write(s.encode('ascii','backslashreplace'))
The slowest run took 9.13 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.37 µs per loop

In [21]: f.close()

奇怪的是，即使每次循环时间更快，encode（'unicode_escape'）从timeit的延迟也比encode（'ascii'，'backslashreplace'）的延迟长很多，因此请务必在您的环境中同时进行测试。

Curiously, the lag from timeit for encode('unicode_escape') is a lot longer than that from encode('ascii', 'backslashreplace') even though the per loop time is faster, so be sure to test both in your environment.

这篇关于制作Unicode转义码的更有效方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！