我使用此代码将<p><br>标记保留为字符串。

from bs4 import BeautifulSoup

mystring = 'aaa<p>Radio and<BR> television.<br></p><p align="right">very<br/> popular in the world today.</p><p class="myclass">Millions of people watch TV. </p><p>That’s because a radio is very small <span style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span style=":_black;">haha100%</span></p>bb'
soup = BeautifulSoup(mystring,'html.parser')
for e in soup.find_all():
    if e.name not in ['p','br']:
        e.unwrap()
print(str(soup))

结果是:
aaa<p>Radio and<br/> television.<br/></p><p align="right">very<br> popular in the world today.</br></p><p class="myclass">Millions of people watch TV. </p><p>That’s because a radio is very small 98.2%</p><p>and it‘s easy to carry. haha100%</p>bb

但是我发现<p>标签下有一些属性。
例如,align和class。
实际上,我想删除align="right"标记中的class="myclass"<p>以及其他属性,只需保留<p>标记。
我想要这个结果:
aaa<p>Radio and<br/> television.<br/></p><p>very<br> popular in the world today.</br></p><p>Millions of people watch TV. </p><p>That’s because a radio is very small 98.2%</p><p>and it‘s easy to carry. haha100%</p>bb

我想删除<p>标记下的属性。
怎么做?

最佳答案

你是说:

for e in soup.find_all():
    if e.name not in ['p','br']:
        e.unwrap()
    else:
        e.attrs={}
print(str(soup))

关于python - 如何使用正则表达式或其他方式在python中删除html中<p>标记下的属性?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56014856/

10-16 09:58