遇到过这个问题,但是对于很多小白来讲,一直都看的模棱两可,现在我们以在采集好数据后写入csv举例说明:
重要知识点:表头先进去。放在采集的循环体外,没事不懂的小白,一会我贴全部代码。
header = ('标题', '租金', '付款方式', '可租年限', '房间数', '联系人', '电话', '地址')
with open('data_list.csv', 'a', newline='') as f:
writer = csv.writer(f)
writer.writerow(header)
程序会先运行上面的代码,创建好csv文件,表头即可存在
这里有个需要特别注意的 就是上面的header 这个不要写成headers了。。不然与url采集的请求头就会有冲突。
下面贴上全部代码:文章来源:https://www.toymoban.com/news/detail-566795.html
import requests
import re
import csv
from bs4 import BeautifulSoup
header = ('标题', '租金', '付款方式', '可租年限', '房间数', '联系人', '电话', '地址')
with open('data_list.csv', 'a', newline='') as f:
writer = csv.writer(f)
writer.writerow(header)
for i in range(1, 2):
url = 'https://www.xxxxxxxxx.com/xxxxx-' + str(i) + '-1.htm'
headers = {
'User-Agent': 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 107.0.0.0Safari / 537.36'
}
res = requests.get(url=url, headers=headers)
res_content = res.text
obj = re.compile(r'<a href="(?P<href>.*?)" class="article grid">', re.S)
res_alists = obj.finditer(res_content)
for a in res_alists:
url_list = a.group('href')
domain_url = 'https://www.xxxxxxx.com/' + url_list
child_res = requests.get(url=domain_url, headers=headers)
child_res_text = child_res.text
child_obj = re.compile(r'<div class="ajtitle">(?P<title>.*?)</div>.*?<span class="shuzi">(?P<price>.*?) </span>.*?'
r'<div class="ajjiben ">.*?付款方式:</div><div class="txt">(?P<fukuan>.*?)</div></div>'
r'.*?出租年限:</div><div class="txt">(?P<nianxian>.*?)</div></div>'
r'.*?房 间:</div><div class="txt">(?P<fangjian>.*?)</div></div>'
r'.*?所在地址:</div><div class="txt">(?P<address>.*?)</div></div>'
r'.*?<div class="touxiang">.*?</style>(?P<person>.*?)</div>'
r'.*?<a class="call_tel" href=".*?">(?P<tel>.*?)</a>', re.S)
child_content = child_obj.finditer(child_res_text)
data_list = []
data_list_new = []
for child in child_content:
title = child.group('title').replace('\r\n', '')
price = child.group('price').replace('\r\n', '')
fukuan = child.group('fukuan')
nianxian = child.group('nianxian')
fangjian = child.group('fangjian').replace(' ', '')
#知识点: re.sub('([^\u4e00-\u9fa5\u0030-\u0039])', '', str1)去掉所有字符串里非汉字的字符
fangjian1 = re.sub('([^\u4e00-\u9fa5\u0030-\u0039])', '', fangjian)
person = child.group('person')
person1 = re.sub('([^\u4e00-\u9fa5\u0030-\u0039])', '', person)
tel = child.group('tel')
address = child.group('address')
data_list.extend([title, price, fukuan, nianxian, fangjian1, person1, tel, address])
data_list_new.append(data_list)
with open('data_list.csv', 'a', newline='') as f:
writer = csv.writer(f)
writer.writerow(data_list_new[0])
这里希望能真心帮助到学习python的小白。文章来源地址https://www.toymoban.com/news/detail-566795.html
到了这里,关于python 写入csv时添加表头,这个是亲测,最详细最傻瓜教程的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!