本文介绍了使用 PyQT 的无头 webkit 实现时,HTML 页面大不相同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的印象是,使用 PyQT 的 webkit 的无头浏览器实现会自动为我获取每个 URL 的 html 代码,即使其中包含大量 JS 代码.但我只看到了一部分.我正在与从 Firefox 窗口保存页面时得到的页面进行比较.

I was under the impression that using a headless browser implementation of webkit using PyQT will automatically get me the html code for each URL even with heavy JS code in it. But I am only seeing it partially. I am comparing with the page I get when I save the page from the firefox window.

我正在使用以下代码 -

I am using the following code -

class JabbaWebkit(QWebPage):
    # 'html' is a class variable

    def __init__(self, url, wait, app, parent=None):
        super(JabbaWebkit, self).__init__(parent)
        JabbaWebkit.html = ''

        if wait:
            QTimer.singleShot(wait * SEC, app.quit)
        else:
            self.loadFinished.connect(app.quit)

        self.mainFrame().load(QUrl(url))

    def save(self):
        JabbaWebkit.html = self.mainFrame().toHtml()

    def userAgentForUrl(self, url):
        return USER_AGENT


    def get_page(url, wait=None):
        # here is the trick how to call it several times
        app = QApplication.instance() # checks if QApplication already exists

        if not app: # create QApplication if it doesnt exist
            app = QApplication(sys.argv)
        #
        form = JabbaWebkit(url, wait, app)
        app.aboutToQuit.connect(form.save)
        app.exec_()
        return JabbaWebkit.html

有人能看出代码有什么明显错误吗?

Can some one see anything obviously wrong with the code?

通过几个 URL 运行代码后,我发现这里很清楚地显示了我遇到的问题 - http://www.chilis.com/EN/Pages/menu.aspx

After running the code through a few URLs, here is one I found that shows the problems I am running into quite clearly - http://www.chilis.com/EN/Pages/menu.aspx

感谢您的指点.

推荐答案

页面有ajax代码,加载完成后,还需要一段时间用ajax更新页面.但是你的代码会在加载完成后退出.

The page have ajax code, when it finish load, it still need some time to update the page with ajax. But you code will quit when it finish load.

你应该添加一些这样的代码来等待一些时间并在 webkit 中处理事件:

You should add some code like this to wait some time and process events in webkit:

for i in range(200): #wait 2 seconds
    app.processEvents()
    time.sleep(0.01)

这篇关于使用 PyQT 的无头 webkit 实现时,HTML 页面大不相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 06:04