corpus = """In the US 555-0198 and 1-206-5705-0100 are examples fictitious numbers.
            In the UK, 044-113-496-1834 is a fictitious number.
            In Ireland, the number 353-020-917-1234 is fictitious.
            And in Australia, 061-970-654-321 is a fictitious number.
            311 is a joke."""


我是python的新手,正在研究正则表达式,尝试将所有7,11,12和13位数字更改为零。我希望它仍然看起来像一个电话号码。例如将555-0198更改为000-0000,是否有一种方法可以使311保持原样而不变为零?以下是我能想到的

起初我尝试过,但是它使所有数字为零

    for word in corpus.split():
        nums = re.sub("(\d)", "0",word)
        print(nums)


然后我尝试了一下,但是我意识到用这种方式对11位和13位数字不正确

    def sub_nums():
        for word in corpus.split():
           nums = re.sub("(\d{1,4})-+(\d{1,4})", "000-0000",word)
           print(nums)
    sub_nums()

最佳答案

我使用的正则表达式是:

r'(?<!\S)(?:(?=(-*\d-*){7}(\s|\Z))[\d-]+|(?=(-*\d-*){11}(\s|\Z))[\d-]+|(?=(-*\d-*){12}(\s|\Z))[\d-]+|(?=(-*\d-*){13}(\s|\Z))[\d-]+)'


7位,11位,12位和13位电话号码有重复的“主题”或模式,因此,我将仅解释7位数字的电话号码的模式:


(?!\S)这是一种否定性的含义,适用于所有模式,并表示电话号码不得以非空格字符开头。这是一个双重否定,几乎等同于说电话号码必须以空格开头,但允许电话号码以字符串开头开头。另一种选择是在(?=\s|\A)之后使用等价的正向查找,它表示电话号码必须以空格或字符串开头开头。但是,这是一个可变长度的回溯,Python随附的正则表达式引擎不支持该变量(但PyPi存储库的regex包支持)。
(?=(-*\d-*){7}(\s|\Z)) 7位电话号码的超前要求要求,下一个字符必须由数字和连字符组成,后跟空格或字符串结尾,并且数字位数必须恰好为7。
[\d-]+这将实际匹配输入中的下一位数字和连字符。


See Regex Demo

import re


corpus = """In the US 555-0198 and 1-206-5705-0100 are examples fictitious numbers.
            In the UK, 044-113-496-1834 is a fictitious number.
            In Ireland, the number 353-020-917-1234 is fictitious.
            And in Australia, 061-970-654-321 is a fictitious number.
            311 is a joke."""

regex = r'(?<!\S)(?:(?=(-*\d-*){7}(\s|\Z))[\d-]+|(?=(-*\d-*){11}(\s|\Z))[\d-]+|(?=(-*\d-*){12}(\s|\Z))[\d-]+|(?=(-*\d-*){13}(\s|\Z))[\d-]+)'
new_corpus = re.sub(regex, lambda m: re.sub(r'\d', '0', m[0]), corpus)
print(new_corpus)


印刷品:

In the US 000-0000 and 0-000-0000-0000 are examples fictitious numbers.
            In the UK, 000-000-000-0000 is a fictitious number.
            In Ireland, the number 000-000-000-0000 is fictitious.
            And in Australia, 000-000-000-000 is a fictitious number.
            311 is a joke.

关于python - 用python替换不同长度的数字(re.sub),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59298371/

10-17 03:07