本文介绍了findall用什么漂亮的汤findreg字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有HTML格式的链接
I have links in HTML of the form
<a href="/downloadsServlet?docid=abc" target="_blank">Report 1</a>
<a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a>
我可以使用BeautifulSoup获得以上形式的链接列表
I am able to get a list of links of the above form using BeautifulSoup
我的代码如下
from bs4 import BeautifulSoup
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
listOfLinks = list(soup.findall('a'))
但是,我想在引用链接的文本中找到带有单词"Fetch"的链接.
However, I want to find the links which have the word "Fetch" in the text referencing the link.
我尝试了表单
soup.findAll('a', re.compile(".*Fetch.*"))
但这不起作用.如何仅选择具有href且文本部分中包含获取"字样的标签a?
But that is not working. How do I select only the tags a which have an href and the text portion has the word "Fetch" in it ?
推荐答案
在这里,正则表达式可能有点过分,但是它允许可能的扩展:
A regex may be an overkill here, but it allows for possible extensions:
def criterion(tag):
return tag.has_attr('href') and re.search('Fetch', tag.text)
soup.findAll(criterion)
# [<a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a>]
这篇关于findall用什么漂亮的汤findreg字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!