【Python爬虫与数据分析】基本数据结构-Toy模板网

这篇具有很好参考价值的文章主要介绍了【Python爬虫与数据分析】基本数据结构。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、概述

二、特性

三、列表

四、字典

一、概述

Python基本数据结构有四种，分别是列表、元组、集合、字典，这是Python解释器默认的数据结构，可以直接使用，无需像C语言那样需要手搓或像C++那样需要声明STL头文件。

Python的数据结构非常灵活，对数据类型没有限制，即一个数据结构对象中可以包含多个不同数据类型的元素，这是与C/C++有很大区别的，因为C/C++的一个数据结构对象是只能由同种数据类型的元素构成的。

列表：关键字 list，用 [ ] 中括号表示，list数据可以进行随意的修改和重构。
元组：关键字 tuple，用 ( ) 小括号表示，tuple数据不可修改，所以元组对象只能初始化赋值或者用其他元组赋值（重构）。
集合：关键字 set，用 { } 花括号表示，set数据可以修改和重构，集合的特性是数据是无序且唯一的，即数据在加入时便进行了去重处理。
字典：关键字 dict，用 { } 花括号表示（集合和字典的关键字符号都是花括号，用空花括号对变量进行初始化时，默认是字典），dict的数据类型一定是 <Key: Value> 键值对，dict数据可以随意更改和重构，字典对象里面包含有Key值列表和Value值列表

lst1 = list
lst2 = list()
lst3 = []
print(type(lst1), type(lst2), type(lst3))   
# <class 'type'> <class 'list'> <class 'list'>

tpl1 = tuple()
tpl2 = ()
print(type(tpl1), type(tpl2))   
# <class 'tuple'> <class 'tuple'>

s1 = set()
s2 = {}
s3 = {1}
print(type(s1), type(s2), type(s3))   
# <class 'set'> <class 'dict'> <class 'set'>

dct1 = dict()
dct2 = {}
print(type(dct1), type(dct2))   
# <class 'dict'> <class 'dict'>

二、特性

列表、元组、集合、字典都没有数据类型限制，即一个数据结构对象中可以包含多个不同数据类型的元素。

列表、元组、集合、字典都支持重构，即用另一个数据结构对本对象进行赋值，若对方与自身数据结构类型不同，那么本对象的数据结构类型也会发生转换。用相同的值去赋值一个数据结构对象，也会进行重构（值不变，地址改变）。

列表、元组、集合、字典都支持迭代，即可用for循环对本对象进行迭代遍历，对dict对象进行迭代的时候，会默认只遍历Key值列表，我们可以指定Key值列表或Value值列表进行dict对象的迭代遍历。

列表、元组、字典可以相互包含，即一个数据结构对象的元素可以是另一个数据结构对象，而作为元素的数据结构对象中又可以包含其他的数据结构对象，且无类型限制。set对象的元素不能是其他数据结构对象。

列表、元组支持下标访问（随机访问），也只有支持下标访问才能进行切片操纵。可以根据下标获取值，也可以通过 index() 函数根据内容获得下标。

列表支持合并，相同类型的数据结构对象之间可以通过“+”号合并为一个对象，可以自身与自身合并、自身与另一对象合并或者两对象合并赋值给第三者。

# 列表、元组、字典可相互包含，集合不能相互包含
lst = [1, 2, 3, "hello", ["hello world", {"name": "张三", "age": 20}]]

tpl = (1, 2, 3, "hello", ["hello world", {"name": "张三", "age": 20}])

s = {1, 2, 3, "hello"}

dct = {
    "list": lst,
    "tuple": tpl,
    "set": s
}

# 列表、元组可下标访问
print(lst[0])    # 1
print(tpl[0])    # 1
print(f"{lst.index('hello')}")    # 3
print(f"{tpl.index('hello')}")    # 3

# 迭代
for i in lst:
    print(i, end=' ')
print()
# 1 2 3 hello ['hello world', {'name': '张三', 'age': 20}]

for i in tpl:
    print(i, end=' ')
print()
# 1 2 3 hello ['hello world', {'name': '张三', 'age': 20}]

for i in s:
    print(i, end=' ')
print()
# 1 2 3 hello

for i in dct:
    print(i, end=' ')
print()
# list tuple set

for i in dct.keys():
    print(i, end=' ')
print()
# list tuple set

for i in dct.values():
    print(i, end=' ')
print()
# [1, 2, 3, 'hello', ['hello world', {'name': '张三', 'age': 20}]] (1, 2, 3, 'hello', ['hello world', {'name': '张三', 'age': 20}]) {1, 2, 3, 'hello'}

lst1 = [s]
print(lst1)     # [{1, 2, 3, 'hello'}]


# 列表、集合、字典可合并，元组数值不能修改，故不能合并
lst1 = lst1 + lst1
print(lst1)
# [{1, 2, 3, 'hello'}, {1, 2, 3, 'hello'}]

lst1 = lst + lst1
print(lst1)
# [1, 2, 3, 'hello', ['hello world', {'name': '张三', 'age': 20}], {3, 1, 'hello', 2}, {3, 1, 'hello', 2}]

lst2 = lst + lst
print(lst2)
# [1, 2, 3, 'hello', ['hello world', {'name': '张三', 'age': 20}], 1, 2, 3, 'hello', ['hello world', {'name': '张三', 'age': 20}]]


# 重构
print(f'list_id = {id(lst)}')
print(f'tuple_id = {id(tpl)}')
print(f'set_id = {id(s)}')
print(f'dict_id = {id(dct)}')
print()

lst = [tpl, s]
tpl = (lst)
s = {1, 2, 3, "hello"}  # 相同的值进行重构
dct = {
    "key1": lst[0],
    "key2": tpl[0],
}

print(f'list_id = {id(lst)}')
print(f'tuple_id = {id(tpl)}')
print(f'set_id = {id(s)}')
print(f'dict_id = {id(dct)}')   # 所有数据结构对象的地址都改变

三、列表

Python中的 list 数据结构与C/C++的STL标准模板库中的 vector / list 容器很相似，相当于它们的结合，即既有C++ vector 的特性，又有C++ list 的特性。

Python中的 list 数据结构对象的元素，同种类型的元素顺序（连续）存储，不同类型的元素存储在不同的空间块。而C++ vector 一定是顺序存储，C++ list 一定是随机存储。

lst1 = [1, 2, -1, 1.1, -1.1, "hello", "world", [1, 2], 3, 4]

for i in lst1:
    print(f'val = {i}, id = {id(i)}')

Python的 list 可以通过 "+" 号进行合并，也可以通过 extend() 函数拓展

切片：根据下标的随机访问，对数据进行选择性提取

通过 in 关键字查找元素，返回值为布尔变量

通过 pop() 函数根据下标删除元素，通过 remove() 函数根据内容删除元素

通过 reserve() 函数逆置，通过 clear() 函数清空

支持自乘运算，即列表自乘一个数，相当于这个用自身拓展几倍

lst1 = [0, 1, 2, 3, 4]
lst2 = [5, 6, 7, 8, 9]

# 拓展
lst1.extend(lst2)
print(lst1)     # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# 切片
print(lst1[:])      # 不设置前后区间，即打印全部元素
print(lst1[1:])     # 设置左闭区间，从下标为 1 的位置开始打印
print(lst1[:3])     # 设置右开区间，打印到下标小于 3 的位置
print(lst1[1:5])    # 设置左开右闭区间，打印下标为 [1, 5) 的元素
print(lst1[1:5:2])  # 设置左开右闭区间并设置步长为 2 进行打印
print(lst1[1::3])   # 不设置右区间且步长为 3 进行打印
print(lst1[::-1])   # 设置步长为 -1，即逆序打印
print(lst1[-1])     # 下标为负数，即从后往前进行下标索引，下标 -1 就是最后一个元素

# 查找
print(1 in lst1)            # True
print(-1 in lst1)           # False
print('hello' not in lst1)  # True

# 删除
lst1.pop()      # pop()默认进行尾删
lst1.pop(1)
print(lst1)     # [0, 2, 3, 4, 5, 6, 7, 8]
lst1.remove(5)
print(lst1)      # [0, 2, 3, 4, 6, 7, 8]

元组、集合很多特性与列表相似，各自有各自的特性（元组不可修改、集合无序且去重），故在保持各自特性的前提下，列表的部分功能元组和集合不能拥有：

元组和集合都不能合并、拓展和逆置
元组还不能删除，但是元组可以进行自乘运算（重构）
集合不能自乘运算、随机访问和切片

四、字典

字典 dict 与 C++的STL标准模板库的 map 容器相似，数据都是 <Key: Value> 键值对类型，且键值不允许重复。

字典 dict 中默认包含两个列表，即一个Key值列表和一个Value值列表，方便进行字典的数据管理文章来源地址https://www.toymoban.com/news/detail-512769.html

a = {}
b = dict()
print(type(a), type(b))
 
a = {
    'id': '0001',
    'name': '张三',
    'age': 20
}
print(a)
print(a['id'])
 
# 查找key值
print('id' in a)        # True
print('class' in a)     # False
 
a['class'] = 1  # 插入键值对
print(a)
a['class'] = 2  # 修改键值对
print(a)
 
a.pop('id') # 删除键值对
print(a)
 
print(a.keys())     # 查询键值
print(a.values())   # 查询值
print(a.items())    # 查询键值对
 
# 遍历
for key in a:
    print(key, a[key])
 
print()
for key, value in a.items():
    print(key, value)

# 快速生成Value值一样的字典
c = dict.fromkeys([1, 2, 3], "name")
print(c)

到了这里，关于【Python爬虫与数据分析】基本数据结构的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！