本文介绍了使用Electron(NightareJS)反复单击页面上的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为动态网页编写页面抓取工具。页面具有初始加载,然后在短加载时间后加载剩余的内容。

I'm writing a page scraper for a dynamic web page. The page has an initial load and then loads the remainder of the content after a short load time.

我已经考虑了负载,并且已经成功从页面中删除了HTML,但页面并未立即加载所有内容。相反,它通过GET请求URL加载指定数量的内容,然后在页面上有一个获取更多按钮。我的目标是单击获取更多按钮,直到所有内容都加载到页面上。对于那些想知道的人,我不希望通过GET URL一次性加载所有内容,因为它们会影响他们的服务器。

I've accounted for the load and have successfully scraped the HTML from the page, but the page doesn't load ALL the content at once. Instead it loads a specified amount of content via GET request URL and then has a "Get more" button on the page. My objective is to click this "Get More" button until all the content is loaded on the page. For those wondering, I don't wish to load all the content at once via GET URL because of impact to their server.

我陷入了形成循环或迭代的困境这将允许我反复点击该页面。

I'm stuck forming the loop or iteration that would allow me to repeatedly click on the page.

const NIGHTMARE = require("nightmare");		
const BETHESDA = NIGHTMARE({ show: true });

BETHESDA
  // Open the bethesda web page. Web page will contain 20 mods to start.
  .goto("https://bethesda.net/en/mods/skyrim?number_results=40&order=desc&page=1&platform=XB1&product=skyrim&sort=published&text=")
  
  // Bethesda website serves all requested mods at once. Each mod has the class "tile". Wait for any tile class to appear, then proceed.
  .wait(".tile");

let additionalModsPresent = true;
while(additionalModsPresent) {
  setTimeout(function() {
    BETHESDA
      .wait('div[data-is="main-mods-pager"] > button')
      .click('div[data-is="main-mods-pager"] > button')
  }, 10000)
  

  additionalModsPresent = false;
}


//  let moreModsBtn = document.querySelector('div[data-is="main-mods-pager"] > button');

  // .end()
  BETHESDA.catch(function (error) {
    console.error('Search failed:', error);
  });

到目前为止,我的想法是使用一个while循环,试图在一段时间后点击按钮。如果发生错误,可能是因为该按钮不存在。我遇到的问题是我似乎无法在setTimeout或setInterval内部使用click。我相信存在某种范围问题,但我不知道到底发生了什么。

My thinking thus far has been to use a while loop that attempts to click the button after some interval of time. If an error occurs, it's likely because the button doesn't exist. The issue I'm having is that I can't seem to get the click to work inside of a setTimeout or setInterval. I believe there is some sort of scoping issue but I don't know what exactly is going on.

如果我可以让click方法在setInterval或类似的东西中工作,问题就解决了。

If I can get the click method to work in setInterval or something similar, the issue would be solved.

想法?

推荐答案

你可以参考问题(问题在循环中运行噩梦)[

You can refer to the issue (Problem running nightmare in loops)[https://github.com/segmentio/nightmare/issues/522]

我根据给定的指导修改了你的代码。它似乎工作正常

I modified your code with given guidelines. It seem to work fine

const NIGHTMARE = require("nightmare");
const BETHESDA = NIGHTMARE({
  show: true
});

BETHESDA
  // Open the bethesda web page. Web page will contain 20 mods to start.
  .goto("https://bethesda.net/en/mods/skyrim?number_results=40&order=desc&page=1&platform=XB1&product=skyrim&sort=published&text=")

  // Bethesda website serves all requested mods at once. Each mod has the class "tile". Wait for any tile class to appear, then proceed.
  .wait(".tile");

next();

function next() {
  BETHESDA.wait('div[data-is="main-mods-pager"] > button')
    .click('div[data-is="main-mods-pager"] > button')
    .then(function() {
      console.log("click done");
      next();
    })
    .catch(function(err) {
      console.log(err);
      console.log("All done.");
    });
}

最终,对于按钮,它应该在wait()上超时然后你可以处理catch()块中的错误。要小心它继续下去:)我没有等到最后(你的内存可能用完了)。

Ultimately, it should timeout on wait() for button and then you can handle the error in catch() block. Beware it goes on and on :) I did not wait till the end (you might run out of memory).

这篇关于使用Electron(NightareJS)反复单击页面上的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 19:19