当我运行函数以从某个特定站点获取一些链接时,它会从第一页获取链接,但没有继续进行下一页的操作,而是中断显示以下错误。

搜寻器:

import requests
from lxml import html

def Startpoint(mpage):
    page=4
    while page<=mpage:
        address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
        tail="https://www.katalystbusiness.co.nz/business-profiles/"
        page = requests.get(address)
        tree = html.fromstring(page.text)
        titles = tree.xpath('//p/a/@href')
        for title in titles:
            if "bindex" not in title:
                if "cdn-cgi" not in title:
                    print(tail + title)


    page+=1

Startpoint(5)


错误信息:

Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module>
    Startpoint(5)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint
    while page<=mpage:
TypeError: unorderable types: Response() <= int()

最佳答案

您正在将requests.get(address)的结果分配给page。然后,Python无法将requests.Response对象与int进行比较。只需调用page之类的其他名称,例如response。最后一行也有缩进错误。

import requests
from lxml import html

def Startpoint(mpage):
    page=4
    while page<=mpage:
        address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
        tail="https://www.katalystbusiness.co.nz/business-profiles/"
        response = requests.get(address)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//p/a/@href')
        for title in titles:
            if "bindex" not in title:
                if "cdn-cgi" not in title:
                    print(tail + title)


        page+=1

Startpoint(5)

关于python - 麻烦进入下一页,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43548652/

10-16 11:21