selenium自动获取cookies用于requests做接口请求爬虫-Toy模板网

这篇具有很好参考价值的文章主要介绍了selenium自动获取cookies用于requests做接口请求爬虫。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、思想

二、代码

有关爬虫一些报错解决，或是爬虫工具的使用及其他问题的，可以看看我的爬虫专栏：博主_zkkkkkkkkkkkkk的爬虫专栏

一、思想

selenium可以用来模拟用户操作的python第三方包，而requests则是用来做接口请求。两者一结合，可能在某些方面优于单个包的爬虫。在requests请求时，我们都知道requests是需要headers信息。所以自动获取cookies等headers里关键信息就至关重要，而selenium因为是打开一个浏览器模拟用户操作的特性，使它能够获取到它自己打开的页面cookies。我们使用这个cookies传入requests，然后再使用requests做爬虫即可。

二、代码

下面一个简单例子来说明（可能有些网站页面是不支持这样的，具体可以自行测试下）文章来源地址https://www.toymoban.com/news/detail-572316.html

from selenium import webdriver
import requests

# selenium启动并打开 http://tpi.zhonju.cn/ 页面
chrome = webdriver.Chrome()
chrome.get('http://tpi.zhonju.cn/')   

# 打印页面cookies信息    是一个dict类型的对象
print(chrome.get_cookies())

# cookies做拼接
cookies_list = [item["name"] + "=" + item["value"] for item in chrome.get_cookies()]
cookies = ';'.join(it for it in cookies_list)
print(cookies)


# 得到cookies后，即可使用requests来做接口爬虫
headers = {
    'Content-Type':'application/json;charset=UTF-8',
    'Cookie':f'{cookies}',
    'Connection':'keep-alive',
    'Accept':'text/html,application/xhtml+xml,application/xml',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}
req = requests.get('http://xxx.com/',headers=headers)
# 以文本输出请求结果
print(req.text)
# 以json输出（前提是接口返回结果是json或dict类型的）
print(req.json)
# 以二进制输出
print(req.content)

# 最后关闭selenium打开的页面，cookies也随之失效
chrome.quit()

到了这里，关于selenium自动获取cookies用于requests做接口请求爬虫的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！