python-网络爬虫.Request-Toy模板网

这篇具有很好参考价值的文章主要介绍了python-网络爬虫.Request。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

Request

python中requests库使用方法详解：

一简介：

Requests 是Python语言编写，基于urllib，

采用Apache2 Licensed开源协议的 HTTP 库。

与urllib相比，Requests更加方便，处理URL资源特别流畅。

可以节约我们大量的工作，建议爬虫使用Requests库。

二、安装Requests库命令行方式：pip install requests

pycharm安装：

python-网络爬虫.Request,python,爬虫,开发语言

项目导入：import requests

requests库7个主要方法，13个关键字参数：

方法说明

requsts.requst() 构造一个请求，最基本的方法，是下面方法的支撑

requsts.get() 获取网页，对应HTTP中的GET方法

requsts.post() 向网页提交信息，对应HTTP中的POST方法

requsts.head() 获取html网页的头信息，对应HTTP中的HEAD方法

requsts.put() 向html提交put方法，对应HTTP中的PUT方法

requsts.patch() 向html网页提交局部请求修改的的请求，

对应HTTP中的PATCH方法

requsts.delete() 向html提交删除请求，对应HTTP中的DELETE方法

三、基本用法：

import requests

response = requests.get('http://www.baidu.com')

print(response.status_code) # 打印状态码

print(response.url) # 打印请求url

print(response.headers) # 打印头信息

print(response.cookies) # 打印cookie信息

print(response.text) #以文本形式打印网页源码返回的类型是str

print(response.content) #以字节流形式打印返回的类型是bytes print(response.apparent_encoding) #网站的编码格式

GET请求：

GET是通过URL方式请求，可以直接看到，明文传输。

response = requests.get('http://www.baidu.com')

GET用于从服务器端获取数据，包括静态资源(HTML|JS|CSS|Image等等)、动态数据展示(列表数据、详情数据等等)。

其中：利用返回值的 text 属性，可以得到请求的内容：

import requests

response = requests.get("http://www.baidu.com")

response.encoding = "utf-8" #中文显示

print(response.text)

我们终于将一个网页以程序方式自动获取到了。

偶尔我们还需要

带参数的 get() 方法1；

url = 'http://www.baidu.com/s?page=2' # 使用？携带参数

response = requests.get(url)

print(response.text)

带参数的 get() 方法2：

url = 'http://www.baidu.com/s' data= {'page': '2'} #将携带的参数传给params

response = requests.get(url, params=data)

print(response.text)

有些网站访问时必须带有浏览器等信息，如果不传入headers就会报错如果想传递headers，可以利用headers参数：只需要将一个dict传递给headers参数便可以定制headers import requests response = requests.get("https://www.zhihu.com/explore")

print(response.text)

POST请求

POST是通过header请求，可以开发者工具或者抓包可以看到，同样也是明文的。

POST用于向服务器提交数据，比如增删改数据，提交一个表单新建一个用户、或修改一个用户等

典型的写法如下：

response=requests.post(url=url,headers=headers,data=data_search)

对于POST请求，当我们传递参数的时候，一般是利用data这个参数，

直接上代码：

data = {

'name': 'zhangsan' ,

'age': 22, 'sex':

'男'

}

response = requests.post('http://httpbin.org/post' , data=data)

#print(response.text) #中文显示乱码

print(response.content.decode("unicode-escape"))

从输出结果中的“form”值来看传参数成功了，并由服务器返回给我们一个requests简单爬虫案例：

# 天气网西安地区爬虫案例

# -*- coding:utf-8 -*-
'''
@Author: 董咚咚
@contact: 2648633809@qq.com
@Time: 2023/7/31 14:59
@version: 1.0
'''
import requests
import lxml
from lxml import etree

class WeatherSpider:
    def __init__(self):
        self.url = "http://www.weather.com.cn/weather/101110101.shtml"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}

    def get_url_content(self):
        return requests.get(self.url, headers=self.headers).content.decode()

    def get_weather_data(self, html):
        tmp_html = etree.HTML(html)
        tomorrow_doc = tmp_html.xpath("//div[contains(@class,'con') and contains(@class,'today')]//div[@class='c7d']/ul/li[2]")[0]
        weather_data = {}
        weather_data["date"] = tomorrow_doc.xpath("./h1/text()")[0]
        weather_data["weather"] = tomorrow_doc.xpath("./p[@class='wea']/@title")[0]
        weather_data["temperature_max"] = tomorrow_doc.xpath("./p[@class='tem']/span/text()")[0]
        weather_data["temperature_min"] = tomorrow_doc.xpath("./p[@class='tem']/i/text()")[0]
        weather_data["air_speed"] = tomorrow_doc.xpath("./p[@class='win']/i/text()")[0]
        return weather_data

def run(self):
 content_html = self.get_url_content()
 data = self.get_weather_data(content_html)
 print(data)

if __name__ == '__main__':
 spider = WeatherSpider()
 spider.run()

运行结果如下：

{'date': '18日（明天）' , 'weather': '多云转晴' , 'temperature_max': '24' , 'temperature_min': '10℃' , 'air_speed': '3-4级'}文章来源地址https://www.toymoban.com/news/detail-621349.html

到了这里，关于python-网络爬虫.Request的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！