python - python编码utf'Miały'不等于'Miały'

啊！
这个问题涉及字符串和编码。
我有一个文件'build_list.txt'：

Miały password
something not important
stuff stuff
stuff

并读取file.py：

import csv

with open('build_list.txt', 'r', encoding='utf-8') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')

    for i, row in enumerate(spamreader):
        if i == 0:
            username = str(row[0]).strip()
            password = str(row[1]).strip()
            if username != "Miały":
                print("FAIL: {!r} != {!r}".format(
                    username.encode('utf-8'), "Miały".encode('utf-8')))

它印出来了

FAIL: b'\xef\xbb\xbfMia\xc5\x82y' != b'Mia\xc5\x82y'

为什么是这样，怎么解决？
我使用pycharm，在windows中用utf-8编码保存txt文件（ANSI产生奇怪的字符）

最佳答案

CSV文件的开头有一个不可见的byte order mark，它被视为文件中第一条记录的一部分。UTF-8-encoded text files aren't supposed to have byte order marks，但Windows程序无论如何都有插入它们的坏习惯。通过使用utf-8-sig编码打开文件，而不是使用普通的utf-8，可以使Python忽略BOM：

with open('build_list.txt', 'rt', encoding='utf-8-sig') as csvfile:
    # ...

但是，在编写文件时不要使用这种编码，除非您必须与一个无法识别文件为UTF-8且没有字节顺序标记的程序进行互操作。

关于python - python编码utf'Miały'不等于'Miały'，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/44509532/