问题描述
我正在尝试使用无头Chrome驱动程序检索网站的html代码.但是,我收到权限被拒绝"消息.如果我使用常规"驱动程序,则一切正常.
I am trying to retrieve the html code of a site using a headless chrome driver. However I get a "permission denied" message. If I use a "regular" driver it all works fine.
有什么办法可以绕过它?
Is there any way to bypass that?
这是我的第一篇文章,因此对格式中的任何潜在错误我深表歉意
It's my first post so I do apologize for any potential mistakes in formatting
from selenium import webdriver
#Headless driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
driver1 = webdriver.Chrome(executable_path='./chromedriver', options=chrome_options,
service_args=['--verbose', '--log-path=/tmp/chromedriver.log'])
driver1.get('https://www.size.co.uk/')
html = driver1.page_source
html
我收到的消息是:
<html xmlns="http://www.w3.org/1999/xhtml"><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don\'t have permission to access "http://www.size.co.uk/" on this server.<p>\nReference #18.ac81655f.1548818550.73b12da\n\n\n</p></body></html>
常规驱动程序:
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.size.co.uk/')
html = driver.page_source
driver.quit()
html
理想情况下,我希望输出与后一种情况相同,而不会每隔几秒钟弹出新窗口.
Ideally, I'd like the output to be as in the latter case without having new windows popping up every couple seconds.
推荐答案
添加以下代码段即可返回该页面:
Adding in the following code snippet got the page to return for me:
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
chrome_options.add_argument('user-agent={0}'.format(user_agent))
该网站显然正在检查无头浏览器,然后拒绝它们访问.以下是有关避免检测的文章:使Chrome无法检测到无头
The site is obviously checking for headless browsers and then denying them access. Here's an article on avoiding detection: Making Chrome Headless Undetectable
要获取驱动程序正在使用的用户代理,可以运行以下命令:
To get the user agent being used by the driver you can run the following command:
driver.execute_script("return navigator.userAgent")
Chromes无头用户代理是这样的:
Chromes headless user agent is something like this:
这篇关于如何在没有被拒绝的情况下通过无头驱动程序访问站点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!