Python使用Selenium库如何绕过Cloudflare验证，网页请确认你是不是机器人

这篇具有很好参考价值的文章主要介绍了Python使用Selenium库如何绕过Cloudflare验证，网页请确认你是不是机器人。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

大家好，我是淘小白~

前段时间使用selenium库写chatGPT的脚本，遇到过一个问题，那就是cloudflare的机器验证，让你点击确认不是机器人，这个问题最后找人解决掉了，我也是百度了很久没找到答案，B站找到的一个UP主，只要报名人家的课程才会给方法，所以，下面就把这个问题怎么解决来说明一下！

Python使用Selenium库如何绕过Cloudflare验证，网页请确认你是不是机器人,selenium,测试工具

1、被检测到的原因

网站检测到了网页是selenium驱动起来的，并且包含一些特定的特征导致的机器验证。

2、网上查找方法

我查了两天的资料，最后发现所有的资料都指向一个库Undetected-chromedriver

下面是一位博主给的解决办法：

# Cloudflare和很多其他网站一样会检测访问是否为Selenium bot，其中一项为检测Selenium运行时出现的特有js变量。

# 这里主要包括了是否含有"selenium"/ "webdriver"的变量或者含有"$cdc_"/"$wdc_"的文件变量。

# 每个driver的检测机制会不一样，此处给出的方案基于chromedriver。

# 1. Undetected-chromedriver
# 非常简单好用的包，直接pip安装，如下初始化driver即可，之后就像正常Selenium使用即可。

import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://nowsecure.nl')
# 2. 直接修改chromedriver executable
# 将key变量修改成任意不含"cdc"的字符。

/**
 * Returns the global object cache for the page.
 * @param {Document=} opt_doc The document whose cache to retrieve. Defaults to
 *     the current document.
 * @return {!Cache} The page's object cache.
 */
function getPageCache(opt_doc, opt_w3c) {
  var doc = opt_doc || document;
  var w3c = opt_w3c || false;
  // |key| is a long random string, unlikely to conflict with anything else.
  var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  if (w3c) {
    if (!(key in doc))
      doc[key] = new CacheWithUUID();
    return doc[key];
  } else {
    if (!(key in doc))
      doc[key] = new Cache();
    return doc[key];
  }
}

# 这两种本质上没有太大的区别，undetected-chromedriver本质上是给chromedriver启动时打上了一个补丁，完成了修改key的那一步

def patch_exe(self):
    """
    Patches the ChromeDriver binary
    :return: False on failure, binary name on success
    """
    logger.info("patching driver executable %s" % self.executable_path)
 
    linect = 0
    replacement = self.gen_random_cdc() #此处修改了cdc的名称
    with io.open(self.executable_path, "r+b") as fh:
        for line in iter(lambda: fh.readline(), b""):
            if b"cdc_" in line:
                fh.seek(-len(line), 1)
                newline = re.sub(b"cdc_.{22}", replacement, line)
                fh.write(newline)
                linect += 1
        return linect

这个库我自己测试了很久，都是有报错的问题存在，看github上面的讨论区，也没有多少有质量的讨论，大部分还是没有解决这个问题。

3、需要具备条件

3.1谷歌浏览器117或者116版本的，最新版的118的不行

3.2 下载谷歌浏览器对应版本的驱动

3.3下载我提供的 undetected_chromedriver.exe，放在Python代码的目录下面

下载链接会附在文章末尾！

注意：把自己的谷歌浏览器卸载掉，然后，驱动下载下来，放到python安装目录的根目录下面，还有就是百度一下，把谷歌浏览器的自动更新给禁止掉。安装好我提供的谷歌浏览器之后，就可以开始测试了，禁止更新，后面可以自己搞一下。

4、增加下面代码

    chrome_options.add_argument('--disable-infobars')
    chrome_options.add_argument('--disable-blink-features=AutomationControlled')
    chrome_options.add_argument("user-data-dir={}".format(user_data_dir))
    driver = Chrome(service=Service('./undetected_chromedriver.exe'), options=chrome_options)

user_data_dir 我直接调用的谷歌浏览器的本地数据，使用这个方法，就可以用登录自己账号了，淡然，也可以删除掉这个，提取方法：谷歌浏览器搜索框，chrome://version/ 回车

Python使用Selenium库如何绕过Cloudflare验证，网页请确认你是不是机器人,selenium,测试工具