本文介绍了如何检索数据...页面是使用ajax加载的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网站获取手机费用http://www.univercell.in/buy/SMART

I want to get costs of mobile phones from this site http://www.univercell.in/buy/SMART

我试图测试它,所以我使用了:Scarpy shell http://www.univercell.in/control/AjaxCategoryDe​​tail?productCategoryId=PRO-SMART&category_id=PRO-SMART&attrName=&min=&max=&sortSearchPrice=&VIEW_INDEX=2&VIEW_SIZE=15&ser​​achupload=&sortupload=

i tried to test it so i used:scarpy shell http://www.univercell.in/control/AjaxCategoryDetail?productCategoryId=PRO-SMART&category_id=PRO-SMART&attrName=&min=&max=&sortSearchPrice=&VIEW_INDEX=2&VIEW_SIZE=15&serachupload=&sortupload=

但是我无法连接到该站点.当页面使用 ajax 加载时,我使用 firebug 找到了 start_url.任何人都可以建议我哪里出错了

But I am not able to connect to this site. As the page is loaded using ajax I found out the start_url using firebug. Can any one suggest me where I am going wrong

推荐答案

这是你的蜘蛛:

from scrapy.item import Item, Field
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector


class UnivercellItem(Item):
    vendor = Field()
    model = Field()
    price = Field()

BASE_URL = "http://www.univercell.in/control/AjaxCategoryDetail?productCategoryId=PRO-SMART&category_id=PRO-SMART&attrName=&min=&max=&sortSearchPrice=&VIEW_INDEX=%s&VIEW_SIZE=15&serachupload=&sortupload="

class UnivercellSpider(BaseSpider):
    name = "univercell_spider"
    allowed_domains = ["www.univercell.in"]
    start_urls = [BASE_URL % index for index in range(1, 21)]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        mobiles = hxs.select("//div[@class='productsummary']")
        print mobiles
        for mobile in mobiles:
            item = UnivercellItem()
            item['vendor'] = mobile.select('.//div[1]/div/text()').extract()[0].strip()
            item['model'] = mobile.select('.//div[3]/div[1]/a/text()').extract()[0].strip()
            item['price'] = mobile.select('.//span[@class="regularPrice"]/span/text()').extract()[0].strip()
            yield item

将其保存到spider.py 并通过scrapy runspider spider.py -o output.json 运行.然后在 output.json 你会看到:

Save it to spider.py and run via scrapy runspider spider.py -o output.json. Then in output.json you will see:

{"model": "T375", "vendor": "LG", "price": "Special Price Click Here"}
{"model": "P725 Optimus 3D Max", "vendor": "LG", "price": "Special Price Click Here"}
{"model": "P705 Optimus L7", "vendor": "LG", "price": "Special Price Click Here"}
{"model": "9320 Curve", "vendor": "Blackberry", "price": "Special Price Click Here"}
{"model": "Xperia Sola", "vendor": "Sony", "price": "Rs.14,500.00"}
{"model": "Xperia U", "vendor": "Sony", "price": "Special Price Click Here"}
{"model": "Lumia 610", "vendor": "Nokia", "price": "Special Price Click Here"}
...

希望有所帮助.

这篇关于如何检索数据...页面是使用ajax加载的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 10:10