python3 爬虫相关学习9：BeautifulSoup 官方文档学习-Toy模板网

这篇具有很好参考价值的文章主要介绍了python3 爬虫相关学习9：BeautifulSoup 官方文档学习。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

1 BeautifulSoup 官方文档

2 用bs 和 requests 打开本地html的区别：代码里的一段html内容

2.1 代码和运行结果

2.2 用beautiful 打开本地 html 文件

2.2.1 本地html文件

2.2.2 soup1=BeautifulSoup(html1,"lxml")

2.3 用requests打开本地 html 文件

2.3.1 本地html文件

2.3.2 print(html1)

3 用bs 和 requests 打开本地html的区别：一个独立的html文件

3.1 独立创建一个html文件

3.2 下面是新得代码和运行结果

3.3 用beautiful 打开本地 html 文件

3.3.1 语法差别 soup1=BeautifulSoup(open(path1))

3.4 用 read() 打开本地 html 文件

3.4.1 语法差别 with open(path1 ,"r") as f: 和 res=f.read()

3.5 用requests打开本地 html 文件

4 f.write(soup1.prettify()) 和 html 用 read()读出来差别很大

1 BeautifulSoup 官方文档

Beautiful Soup: We called him Tortoise because he taught us.https://www.crummy.com/software/BeautifulSoup/

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

Beautiful Soup 4.4.0 文档 — Beautiful Soup 4.2.0 中文文档https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

Beautiful Soup 4.4.0 文档 — beautifulsoup 4.4.0q 文档https://beautifulsoup.readthedocs.io/zh_CN/latest/

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

2 用bs 和 requests 打开本地html的区别：代码里的一段html内容

2.1 代码和运行结果

#E:\work\FangCloudV2\personal_space\2learn\python3\py0003.txt

import requests
from bs4 import BeautifulSoup

#html文件内容
html1 = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

#"测试bs4"
print ("测试bs4")
soup1=BeautifulSoup(html1,"lxml")
print (soup1.prettify())

#"对比测试requests"
print ("对比测试requests")
#res=requests.get(html1)
res=html1
#print (res.text)
print (res)

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

2.2 用beautiful 打开本地 html 文件

#"测试bs4"

html1=""" ... """
print ("测试bs4")
soup1=BeautifulSoup(html1,"lxml")
print (soup1.prettify())

2.2.1 本地html文件

这次的本地html 文件是写在 python 脚本内容一起的一段文本
html1=""" ... """

2.2.2 soup1=BeautifulSoup(html1,"lxml")

正确写法
soup1=BeautifulSoup(html1,"lxml")
lxml 是解析方式
如果不写，默认也会采用 lxml的解析
如果写成 soup1=BeautifulSoup(html1) 可以正常运行，但是会提醒

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

lxml

html.parser

应该这几种都可以

2.3 用requests打开本地 html 文件

#"对比测试requests"
print ("对比测试requests")
#res=requests.get(html1)
res=html1
#print (res.text)
print (res)

2.3.1 本地html文件

这次的本地html 文件是写在 python 脚本内容一起的一段文本
html1=""" ... """
本地文件 html 已经是一段脚本内的文本 """ ..."""

2.3.2 print(html1)

本地文件 html 已经是一段脚本内的文本 """ ..."""

正确写法1
res=html1
print (res)

正确写法2
print (html1)

错误写法1
#print (res.text)
#print (html1.text)
只有html作为网页结构的时候，可以用 html.text 取到其中的string 内容
所以
requests.get(url)
requests.get(url).text

requests.exceptions.InvalidSchema: No connection adapters were found for '<html><head><title>The Dormouse\'s story</title></head>\n<body>\nThe Dormouse\'s story\n\nOnce upon a time there were three little sisters; and their names were\n<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,\n<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and\n<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;\nand they lived at the bottom of a well.\n\n...\n'

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

错误写法2
#res=requests.get(html1)
一样的原因
因为这里的html1 不是网页，而已经是网页的内容string了！

AttributeError: 'str' object has no attribute 'text'

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

3 用bs 和 requests 打开本地html的区别：一个独立的html文件

3.1 独立创建一个html文件

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

3.2 下面是新得代码和运行结果

代码

#E:\work\FangCloudV2\personal_space\2learn\python3\py0003-1.txt
#E:\work\FangCloudV2\personal_space\2learn\python3\html0003.html


import requests
import os
import time
from bs4 import BeautifulSoup


path1=r"E:\work\FangCloudV2\personal_space\2learn\python3\html0003.html"
soup1=BeautifulSoup(open(path1))
print ("测试bs4")
print (soup1.prettify())

path2=r'E:\work\FangCloudV2\personal_space\2learn\python3\html0003-1.html'
if not os.path.exists(path2):              
    os.mkdir(path2) 

with open(path2 ,"a") as f:
    f.write("测试bs4")
    f.write(soup1.prettify())


print ("对比测试requests")
with open(path1 ,"r") as f:
    res=f.read()
print (res)

with open(path2 ,"a") as f:
    f.write("对比测试requests")
    f.write(res)



"""
#地址，路径，前都记得加 r, 因为string 内部包含\/等转义符，rawdata安全
url1="E:\work\FangCloudV2\personal_space\2learn\python3\html0003.html"
url1=r"E:\work\FangCloudV2\personal_space\2learn\python3\html0003.html"
res=requests.get(url1)
#本地地址不能像网址 url这样用，用的\/不同，即使用 raw r 也不行. 可以用转格式函数吗？
#https://www.baidu.com/
"""

运行结果

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

另存为的文件内容

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

3.3 用beautiful 打开本地 html 文件

3.3.1 语法差别 soup1=BeautifulSoup(open(path1))

最大的差别

soup1=BeautifulSoup(open(path1))
soup1.prettify() 输出格式化得内容

path1=r"E:\work\FangCloudV2\personal_space\2learn\python3\html0003.html"
soup1=BeautifulSoup(open(path1))
print ("测试bs4")
print (soup1.prettify())

path2=r'E:\work\FangCloudV2\personal_space\2learn\python3\html0003-1.html'
if not os.path.exists(path2):
os.mkdir(path2)

with open(path2 ,"a") as f:
f.write("测试bs4")
f.write(soup1.prettify())

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

3.4 用 read() 打开本地 html 文件

3.4.1 语法差别 with open(path1 ,"r") as f: 和 res=f.read()

和 read()读出来的内容（应该和 requests.get()得出来得内容一样）

print ("对比测试requests")
with open(path1 ,"r") as f:
res=f.read()
print (res)

with open(path2 ,"a") as f:
f.write("对比测试requests")
f.write(res)

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

3.5 用requests打开本地 html 文件

没试过
这种本体html没法试把？

4 f.write(soup1.prettify()) 和 html 用 read()读出来差别很大

和 read()读出来的内容（应该和 requests.get()得出来得内容一样）

soup1.prettify()

5 其他

soup1.text ? 全部文本内容？

soup1.a

soup1.find()

soup1.find_all()

soup1.文章来源地址https://www.toymoban.com/news/detail-480932.html

到了这里，关于python3 爬虫相关学习9：BeautifulSoup 官方文档学习的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

python3 爬虫相关学习9：BeautifulSoup 官方文档学习

1 BeautifulSoup 官方文档