本文介绍了findall用什么漂亮的汤findreg字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有HTML格式的链接

I have links in HTML of the form

<a href="/downloadsServlet?docid=abc" target="_blank">Report 1</a>
<a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a>

我可以使用BeautifulSoup获得以上形式的链接列表

I am able to get a list of links of the above form using BeautifulSoup

我的代码如下

from bs4 import BeautifulSoup
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
listOfLinks = list(soup.findall('a'))

但是,我想在引用链接的文本中找到带有单词"Fetch"的链接.

However, I want to find the links which have the word "Fetch" in the text referencing the link.

我尝试了表单

soup.findAll('a', re.compile(".*Fetch.*"))

但这不起作用.如何仅选择具有href且文本部分中包含获取"字样的标签a?

But that is not working. How do I select only the tags a which have an href and the text portion has the word "Fetch" in it ?

推荐答案

在这里,正则表达式可能有点过分,但是它允许可能的扩展:

A regex may be an overkill here, but it allows for possible extensions:

def criterion(tag):
  return tag.has_attr('href') and re.search('Fetch', tag.text)

soup.findAll(criterion)
# [<a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a>]

这篇关于findall用什么漂亮的汤findreg字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-13 07:25