获取仅有的URL列表与BeautifulSoup的第一个链接

本文介绍了获取仅有的URL列表与BeautifulSoup的第一个链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我解析一个完整的HTML文件，提取某些URL使用Python Beautifulsoup模块，用这种和平code的：

I parsed an entire HTML file, extracting some URLs with Beautifulsoup module in Python, with this peace of code:

for link in soup.find_all('a'):
    for line in link :
        if "condition" in line :

           print link.get("href")

和我在外壳获得一系列环节，在循环若观察病情：

and i get in the shell a series of links that observe the condition in the if loop:

的http：// ..link1

的http：// ..link2

的http：// ..linkn

如何可以把一个变量输出仅此列表的第一个环节？

how can i put in a variable "output" only the first link of this list?

编辑：

该网页是： http://download.cyanogenmod.com/?device=p970，该脚本必须返回第一个短网址：HTML页面（HTTP //get.cm / ...）

The web page is : http://download.cyanogenmod.com/?device=p970 , the script have to return the first short URL (http://get.cm/...) in the HTML page.

推荐答案

您可以用oneliner做到这一点：

You can do it with a oneliner:

import re

soup.find('a', href=re.compile('^http://get.cm/get'))['href']

将其分配给一个变量只是：

to assign it to a variable just:

variable=soup.find('a', href=re.compile('^http://get.cm/get'))['href']

我不知道究竟是什么，你这样做，我将发布完整的code从头开始：
注意！如果你使用BS4改变进口

I have no idea what exactly are you doing so i will post the full code from scratch:NB! if you use bs4 change the imports

import urllib2
from BeautifulSoup import BeautifulSoup
import re

request = urllib2.Request("http://download.cyanogenmod.com/?device=p970")
response = urllib2.urlopen(request)
soup = BeautifulSoup(response)
variable=soup.find('a', href=re.compile('^http://get.cm/get'))['href']
print variable

>>> 
http://get.cm/get/4jj

这篇关于获取仅有的URL列表与BeautifulSoup的第一个链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！