github查看完整论文复现过程
1.内容比对
箱线图比对
国赛C155
复现内容:
2.第一问第二小问复现代码
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
plt.rcParams['font.sans-serif'] = ['SimHei']
xl_file = pd.ExcelFile("E:\\数学建模国赛\\2022数学建模赛题\\C题\\附件.xlsx")
sheet1 = xl_file.parse('表单1')
sheet2 = xl_file.parse('表单2')
sheet3 = xl_file.parse('表单3')
sheet1.head(), sheet2.head(), sheet3.head()
( 文物编号 纹饰 类型 颜色 表面风化
0 1 C 高钾 蓝绿 无风化
1 2 A 铅钡 浅蓝 风化
2 3 A 高钾 蓝绿 无风化
3 4 A 高钾 蓝绿 无风化
4 5 A 高钾 蓝绿 无风化,
文物采样点 二氧化硅(SiO2) 氧化钠(Na2O) 氧化钾(K2O) 氧化钙(CaO) 氧化镁(MgO) 氧化铝(Al2O3) \
0 01 69.33 NaN 9.99 6.32 0.87 3.93
1 02 36.28 NaN 1.05 2.34 1.18 5.73
2 03部位1 87.05 NaN 5.19 2.01 NaN 4.06
3 03部位2 61.71 NaN 12.37 5.87 1.11 5.50
4 04 65.88 NaN 9.67 7.12 1.56 6.44
氧化铁(Fe2O3) 氧化铜(CuO) 氧化铅(PbO) 氧化钡(BaO) 五氧化二磷(P2O5) 氧化锶(SrO) 氧化锡(SnO2) \
0 1.74 3.87 NaN NaN 1.17 NaN NaN
1 1.86 0.26 47.43 NaN 3.57 0.19 NaN
2 NaN 0.78 0.25 NaN 0.66 NaN NaN
3 2.16 5.09 1.41 2.86 0.70 0.10 NaN
4 2.06 2.18 NaN NaN 0.79 NaN NaN
二氧化硫(SO2)
0 0.39
1 NaN
2 NaN
3 NaN
4 0.36 ,
文物编号 表面风化 二氧化硅(SiO2) 氧化钠(Na2O) 氧化钾(K2O) 氧化钙(CaO) 氧化镁(MgO) 氧化铝(Al2O3) \
0 A1 无风化 78.45 NaN NaN 6.08 1.86 7.23
1 A2 风化 37.75 NaN NaN 7.63 NaN 2.33
2 A3 无风化 31.95 NaN 1.36 7.19 0.81 2.93
3 A4 无风化 35.47 NaN 0.79 2.89 1.05 7.07
4 A5 风化 64.29 1.2 0.37 1.64 2.34 12.75
氧化铁(Fe2O3) 氧化铜(CuO) 氧化铅(PbO) 氧化钡(BaO) 五氧化二磷(P2O5) 氧化锶(SrO) 氧化锡(SnO2) \
0 2.15 2.11 NaN NaN 1.06 0.03 NaN
1 NaN NaN 34.30 NaN 14.27 NaN NaN
2 7.06 0.21 39.58 4.69 2.68 0.52 NaN
3 6.45 0.96 24.28 8.31 8.45 0.28 NaN
4 0.81 0.94 12.23 2.16 0.19 0.21 0.49
二氧化硫(SO2)
0 0.51
1 NaN
2 NaN
3 NaN
4 NaN )
sheet2
|
文物采样点 |
二氧化硅(SiO2) |
氧化钠(Na2O) |
氧化钾(K2O) |
氧化钙(CaO) |
氧化镁(MgO) |
氧化铝(Al2O3) |
氧化铁(Fe2O3) |
氧化铜(CuO) |
氧化铅(PbO) |
氧化钡(BaO) |
五氧化二磷(P2O5) |
氧化锶(SrO) |
氧化锡(SnO2) |
二氧化硫(SO2) |
0 |
01 |
69.33 |
NaN |
9.99 |
6.32 |
0.87 |
3.93 |
1.74 |
3.87 |
NaN |
NaN |
1.17 |
NaN |
NaN |
0.39 |
1 |
02 |
36.28 |
NaN |
1.05 |
2.34 |
1.18 |
5.73 |
1.86 |
0.26 |
47.43 |
NaN |
3.57 |
0.19 |
NaN |
NaN |
2 |
03部位1 |
87.05 |
NaN |
5.19 |
2.01 |
NaN |
4.06 |
NaN |
0.78 |
0.25 |
NaN |
0.66 |
NaN |
NaN |
NaN |
3 |
03部位2 |
61.71 |
NaN |
12.37 |
5.87 |
1.11 |
5.50 |
2.16 |
5.09 |
1.41 |
2.86 |
0.70 |
0.10 |
NaN |
NaN |
4 |
04 |
65.88 |
NaN |
9.67 |
7.12 |
1.56 |
6.44 |
2.06 |
2.18 |
NaN |
NaN |
0.79 |
NaN |
NaN |
0.36 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
64 |
54严重风化点 |
17.11 |
NaN |
NaN |
NaN |
1.11 |
3.65 |
NaN |
1.34 |
58.46 |
NaN |
14.13 |
1.12 |
NaN |
NaN |
65 |
55 |
49.01 |
2.71 |
NaN |
1.13 |
NaN |
1.45 |
NaN |
0.86 |
32.92 |
7.95 |
0.35 |
NaN |
NaN |
NaN |
66 |
56 |
29.15 |
NaN |
NaN |
1.21 |
NaN |
1.85 |
NaN |
0.79 |
41.25 |
15.45 |
2.54 |
NaN |
NaN |
NaN |
67 |
57 |
25.42 |
NaN |
NaN |
1.31 |
NaN |
2.18 |
NaN |
1.16 |
45.10 |
17.30 |
NaN |
NaN |
NaN |
NaN |
68 |
58 |
30.39 |
NaN |
0.34 |
3.49 |
0.79 |
3.52 |
0.86 |
3.13 |
39.35 |
7.66 |
8.99 |
0.24 |
NaN |
NaN |
69 rows × 15 columns
component_cols = ['二氧化硅(SiO2)', '氧化钠(Na2O)', '氧化钾(K2O)', '氧化钙(CaO)', '氧化镁(MgO)',
'氧化铝(Al2O3)', '氧化铁(Fe2O3)', '氧化铜(CuO)', '氧化铅(PbO)', '氧化钡(BaO)',
'五氧化二磷(P2O5)', '氧化锶(SrO)', '氧化锡(SnO2)', '二氧化硫(SO2)']
sheet2 ['成分总和'] = sheet2 [component_cols].sum(axis=1)
sheet2 ['成分总和']
sheet2 = sheet2[(sheet2['成分总和'] >= 85) & (sheet2['成分总和'] <= 105)]
sheet2
sheet2 = sheet2.fillna(0)
sheet2[component_cols] = sheet2[component_cols].div(sheet2[component_cols].sum(axis=1), axis=0) * 100
sheet2 ['成分总和'] = sheet2 [component_cols].sum(axis=1)
sheet2
|
文物采样点 |
二氧化硅(SiO2) |
氧化钠(Na2O) |
氧化钾(K2O) |
氧化钙(CaO) |
氧化镁(MgO) |
氧化铝(Al2O3) |
氧化铁(Fe2O3) |
氧化铜(CuO) |
氧化铅(PbO) |
氧化钡(BaO) |
五氧化二磷(P2O5) |
氧化锶(SrO) |
氧化锡(SnO2) |
二氧化硫(SO2) |
成分总和 |
0 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
0.891302 |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
100.0 |
1 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
1.181299 |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
100.0 |
2 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
0.000000 |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
100.0 |
3 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
1.122573 |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
100.0 |
4 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
1.623985 |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
100.0 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
64 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
1.145274 |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
100.0 |
65 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
0.000000 |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
100.0 |
66 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
0.000000 |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
100.0 |
67 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
0.000000 |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
100.0 |
68 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
0.799919 |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
100.0 |
67 rows × 16 columns
sheet2_copy = sheet2.copy()
sheet2=sheet2_copy
new_component_cols = ['SiO2', 'Na2O', 'K2O', 'CaO', 'MgO',
'Al2O3', 'Fe2O3', 'CuO', 'PbO', 'BaO',
'P2O5', 'SrO', 'SnO2', 'SO2']
rename_dict = dict(zip(component_cols, new_component_cols))
sheet2.rename(columns=rename_dict, inplace=True)
sheet2.columns
Index(['文物采样点', 'SiO2', 'Na2O', 'K2O', 'CaO', 'MgO', 'Al2O3', 'Fe2O3', 'CuO',
'PbO', 'BaO', 'P2O5', 'SrO', 'SnO2', 'SO2', '成分总和'],
dtype='object')
2.1 页表合并
import re
def extract_number(s):
match = re.match(r"(\d+)", s)
return int(match.group()) if match else None
sheet2['文物编号'] = sheet2['文物采样点'].apply(extract_number)
data = pd.merge(sheet1, sheet2, on='文物编号')
data
|
文物编号 |
纹饰 |
类型 |
颜色 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
... |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
成分总和 |
0 |
1 |
C |
高钾 |
蓝绿 |
无风化 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
... |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
100.0 |
1 |
2 |
A |
铅钡 |
浅蓝 |
风化 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
... |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
100.0 |
2 |
3 |
A |
高钾 |
蓝绿 |
无风化 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
... |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
100.0 |
3 |
3 |
A |
高钾 |
蓝绿 |
无风化 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
... |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
100.0 |
4 |
4 |
A |
高钾 |
蓝绿 |
无风化 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
... |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
100.0 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
54 |
C |
铅钡 |
浅蓝 |
风化 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
... |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
100.0 |
63 |
55 |
C |
铅钡 |
绿 |
无风化 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
... |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
100.0 |
64 |
56 |
C |
铅钡 |
蓝绿 |
风化 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
... |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
100.0 |
65 |
57 |
C |
铅钡 |
蓝绿 |
风化 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
... |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
100.0 |
66 |
58 |
C |
铅钡 |
NaN |
风化 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
... |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
100.0 |
67 rows × 21 columns
data.drop(['颜色','纹饰','文物编号','成分总和'],axis=1,inplace=True)
data
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
0.891302 |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
1 |
铅钡 |
风化 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
1.181299 |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
2 |
高钾 |
无风化 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
0.000000 |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
3 |
高钾 |
无风化 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
1.122573 |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
4 |
高钾 |
无风化 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
1.623985 |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
铅钡 |
风化 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
1.145274 |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
63 |
铅钡 |
无风化 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
0.000000 |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
64 |
铅钡 |
风化 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
0.000000 |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
65 |
铅钡 |
风化 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
0.000000 |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
66 |
铅钡 |
风化 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
0.799919 |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
67 rows × 17 columns
data.shape
(67, 17)
data
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
0.891302 |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
1 |
铅钡 |
风化 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
1.181299 |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
2 |
高钾 |
无风化 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
0.000000 |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
3 |
高钾 |
无风化 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
1.122573 |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
4 |
高钾 |
无风化 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
1.623985 |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
铅钡 |
风化 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
1.145274 |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
63 |
铅钡 |
无风化 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
0.000000 |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
64 |
铅钡 |
风化 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
0.000000 |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
65 |
铅钡 |
风化 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
0.000000 |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
66 |
铅钡 |
风化 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
0.799919 |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
67 rows × 17 columns
2.2 数据的正态性检验
"""
对于某些统计分析,如回归分析,数据的正态性是一种关键的假设。
然而,是否需要进行这种变换取决于数据本身的特性和分析目标。
现在,让我们查看一下数据
对于您的数据,考虑到它是化学成分数据,并且从前面的分析中我们看到数据的分布并不完全是正态的,
我建议在中心化对数比变换后进行分析。这样可以确保数据满足统计分析的假设,并能更好地处理组成数据的特性。
"""
import matplotlib.pyplot as plt
numeric_cols = data.select_dtypes(include='number').columns
2.2.1数据的正态性检验效果图
fig, axs = plt.subplots(len(numeric_cols), figsize=(10, len(numeric_cols)*3))
for i, col in enumerate(numeric_cols):
axs[i].hist(data[col].dropna(), bins=30, color='skyblue', edgecolor='black', alpha=0.7)
axs[i].set_title(f'Histogram of {col}')
plt.tight_layout()
plt.show()
data_raw=data.copy()
data
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
0.891302 |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
1 |
铅钡 |
风化 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
1.181299 |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
2 |
高钾 |
无风化 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
0.000000 |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
3 |
高钾 |
无风化 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
1.122573 |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
4 |
高钾 |
无风化 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
1.623985 |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
铅钡 |
风化 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
1.145274 |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
63 |
铅钡 |
无风化 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
0.000000 |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
64 |
铅钡 |
风化 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
0.000000 |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
65 |
铅钡 |
风化 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
0.000000 |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
66 |
铅钡 |
风化 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
0.799919 |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
67 rows × 17 columns
"""
正态性检验,们将使用 Shapiro-Wilk 测试来检查每个化学成分的正态性。
这是一种常用的正态性检验方法,它的零假设是数据来自正态分布。
如果 p 值小于 0.05,我们将拒绝零假设,即认为数据不符合正态分布。
"""
from scipy.stats import shapiro, levene
test_results = pd.DataFrame()
for col in numeric_cols[0:]:
col_results = {'Variable': col}
_, p_normal = shapiro(data[col].dropna())
col_results['Normality p-value'] = p_normal
col_results['Normal'] = p_normal > 0.05
if col_results['Normal']:
_, p_equal_var = levene(data.loc[data['表面风化'] == '无风化', col].dropna(),
data.loc[data['表面风化'] == '风化', col].dropna())
col_results['Equal var p-value'] = p_equal_var
col_results['Equal var'] = p_equal_var > 0.05
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
test_results = test_results.append(col_results, ignore_index=True)
test_results
|
Variable |
Normality p-value |
Normal |
Equal var p-value |
Equal var |
0 |
SiO2 |
5.434923e-02 |
True |
0.009129 |
False |
1 |
Na2O |
5.631047e-13 |
False |
NaN |
NaN |
2 |
K2O |
2.218287e-13 |
False |
NaN |
NaN |
3 |
CaO |
8.905178e-06 |
False |
NaN |
NaN |
4 |
MgO |
1.066307e-05 |
False |
NaN |
NaN |
5 |
Al2O3 |
1.085733e-06 |
False |
NaN |
NaN |
6 |
Fe2O3 |
1.809425e-09 |
False |
NaN |
NaN |
7 |
CuO |
3.633815e-09 |
False |
NaN |
NaN |
8 |
PbO |
7.531955e-04 |
False |
NaN |
NaN |
9 |
BaO |
7.773099e-08 |
False |
NaN |
NaN |
10 |
P2O5 |
4.346846e-09 |
False |
NaN |
NaN |
11 |
SrO |
6.648307e-06 |
False |
NaN |
NaN |
12 |
SnO2 |
8.658932e-17 |
False |
NaN |
NaN |
13 |
SO2 |
5.878219e-17 |
False |
NaN |
NaN |
data
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
0.891302 |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
1 |
铅钡 |
风化 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
1.181299 |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
2 |
高钾 |
无风化 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
0.000000 |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
3 |
高钾 |
无风化 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
1.122573 |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
4 |
高钾 |
无风化 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
1.623985 |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
铅钡 |
风化 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
1.145274 |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
63 |
铅钡 |
无风化 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
0.000000 |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
64 |
铅钡 |
风化 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
0.000000 |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
65 |
铅钡 |
风化 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
0.000000 |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
66 |
铅钡 |
风化 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
0.799919 |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
67 rows × 17 columns
2.3不满足正态性,进行中心化对数比变换
from scipy.stats.mstats import gmean
data_centralized = data.copy()
numeric_data = data_centralized.select_dtypes(include='number')
geo_means = []
for index, row in numeric_data.iterrows():
non_zero_values = row[row > 0]
geo_mean = gmean(non_zero_values) if len(non_zero_values) > 0 else 1e-6
geo_means.append(geo_mean)
for col in numeric_data.columns:
data_centralized[col] = np.log(numeric_data[col] / geo_means)
data_centralized.head()
D:\py1.1\envs\pytorch\lib\site-packages\pandas\core\arraylike.py:402: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
3.045978 |
-inf |
1.108685 |
0.650820 |
-1.332161 |
0.175740 |
-0.639014 |
0.160355 |
-inf |
-inf |
-1.035896 |
-inf |
-inf |
-2.134508 |
1 |
铅钡 |
风化 |
02 |
2.676664 |
-inf |
-0.865813 |
-0.064452 |
-0.749089 |
0.831113 |
-0.294026 |
-2.261677 |
2.944652 |
-inf |
0.357963 |
-2.575334 |
-inf |
-inf |
2 |
高钾 |
无风化 |
03部位1 |
3.586159 |
-inf |
0.766410 |
-0.182189 |
-inf |
0.520860 |
-inf |
-1.128785 |
-2.266618 |
-inf |
-1.295839 |
-inf |
-inf |
-inf |
3 |
高钾 |
无风化 |
03部位2 |
3.090699 |
-inf |
1.483527 |
0.738107 |
-0.927387 |
0.673001 |
-0.261639 |
0.595531 |
-0.688158 |
0.019074 |
-1.388422 |
-3.334332 |
-inf |
-inf |
4 |
高钾 |
无风化 |
04 |
2.968764 |
-inf |
1.049957 |
0.743836 |
-0.774386 |
0.643457 |
-0.496365 |
-0.439747 |
-inf |
-inf |
-1.454794 |
-inf |
-inf |
-2.240723 |
2.3.1 核心步骤-inf用0值替换
selected_cols=new_component_cols
data_centralized.replace(-np.inf, 0, inplace=True)
data_centralized
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
3.045978 |
0.000000 |
1.108685 |
0.650820 |
-1.332161 |
0.175740 |
-0.639014 |
0.160355 |
0.000000 |
0.000000 |
-1.035896 |
0.000000 |
0.0 |
-2.134508 |
1 |
铅钡 |
风化 |
02 |
2.676664 |
0.000000 |
-0.865813 |
-0.064452 |
-0.749089 |
0.831113 |
-0.294026 |
-2.261677 |
2.944652 |
0.000000 |
0.357963 |
-2.575334 |
0.0 |
0.000000 |
2 |
高钾 |
无风化 |
03部位1 |
3.586159 |
0.000000 |
0.766410 |
-0.182189 |
0.000000 |
0.520860 |
0.000000 |
-1.128785 |
-2.266618 |
0.000000 |
-1.295839 |
0.000000 |
0.0 |
0.000000 |
3 |
高钾 |
无风化 |
03部位2 |
3.090699 |
0.000000 |
1.483527 |
0.738107 |
-0.927387 |
0.673001 |
-0.261639 |
0.595531 |
-0.688158 |
0.019074 |
-1.388422 |
-3.334332 |
0.0 |
0.000000 |
4 |
高钾 |
无风化 |
04 |
2.968764 |
0.000000 |
1.049957 |
0.743836 |
-0.774386 |
0.643457 |
-0.496365 |
-0.439747 |
0.000000 |
0.000000 |
-1.454794 |
0.000000 |
0.0 |
-2.240723 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
铅钡 |
风化 |
54严重风化点 |
1.216607 |
0.000000 |
0.000000 |
0.000000 |
-1.518696 |
-0.328329 |
0.000000 |
-1.330386 |
2.445287 |
0.000000 |
1.025244 |
-1.509727 |
0.0 |
0.000000 |
63 |
铅钡 |
无风化 |
55 |
2.673354 |
-0.221722 |
0.000000 |
-1.096453 |
0.000000 |
-0.847107 |
0.000000 |
-1.369493 |
2.275410 |
0.854502 |
-2.268492 |
0.000000 |
0.0 |
0.000000 |
64 |
铅钡 |
风化 |
56 |
1.753603 |
0.000000 |
0.000000 |
-1.428231 |
0.000000 |
-1.003666 |
0.000000 |
-1.854574 |
2.100799 |
1.118757 |
-0.686688 |
0.000000 |
0.0 |
0.000000 |
65 |
铅钡 |
风化 |
57 |
1.386720 |
0.000000 |
0.000000 |
-1.578789 |
0.000000 |
-1.069491 |
0.000000 |
-1.700396 |
1.960066 |
1.001890 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
66 |
铅钡 |
风化 |
58 |
2.316326 |
0.000000 |
-2.176597 |
0.152115 |
-1.333510 |
0.160674 |
-1.248610 |
0.043246 |
2.574709 |
0.938225 |
1.098326 |
-2.524904 |
0.0 |
0.000000 |
67 rows × 17 columns
data
|
类型 |
表面风化 |
文物采样点 |
SiO2 |
Na2O |
K2O |
CaO |
MgO |
Al2O3 |
Fe2O3 |
CuO |
PbO |
BaO |
P2O5 |
SrO |
SnO2 |
SO2 |
0 |
高钾 |
无风化 |
01 |
71.027559 |
0.000000 |
10.234607 |
6.474746 |
0.891302 |
4.026227 |
1.782604 |
3.964758 |
0.000000 |
0.000000 |
1.198648 |
0.000000 |
0.0 |
0.399549 |
1 |
铅钡 |
风化 |
02 |
36.319952 |
0.000000 |
1.051156 |
2.342577 |
1.181299 |
5.736310 |
1.862048 |
0.260286 |
47.482230 |
0.000000 |
3.573931 |
0.190209 |
0.0 |
0.000000 |
2 |
高钾 |
无风化 |
03部位1 |
87.050000 |
0.000000 |
5.190000 |
2.010000 |
0.000000 |
4.060000 |
0.000000 |
0.780000 |
0.250000 |
0.000000 |
0.660000 |
0.000000 |
0.0 |
0.000000 |
3 |
高钾 |
无风化 |
03部位2 |
62.408981 |
0.000000 |
12.510113 |
5.936489 |
1.122573 |
5.562298 |
2.184466 |
5.147654 |
1.425971 |
2.892395 |
0.707929 |
0.101133 |
0.0 |
0.000000 |
4 |
高钾 |
无风化 |
04 |
68.582136 |
0.000000 |
10.066625 |
7.412034 |
1.623985 |
6.704143 |
2.144493 |
2.269415 |
0.000000 |
0.000000 |
0.822403 |
0.000000 |
0.0 |
0.374766 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
62 |
铅钡 |
风化 |
54严重风化点 |
17.653735 |
0.000000 |
0.000000 |
0.000000 |
1.145274 |
3.765993 |
0.000000 |
1.382584 |
60.317788 |
0.000000 |
14.579034 |
1.155592 |
0.0 |
0.000000 |
63 |
铅钡 |
无风化 |
55 |
50.850799 |
2.811787 |
0.000000 |
1.172442 |
0.000000 |
1.504462 |
0.000000 |
0.892301 |
34.156464 |
8.248599 |
0.363146 |
0.000000 |
0.0 |
0.000000 |
64 |
铅钡 |
风化 |
56 |
31.602342 |
0.000000 |
0.000000 |
1.311795 |
0.000000 |
2.005637 |
0.000000 |
0.856461 |
44.720295 |
16.749783 |
2.753686 |
0.000000 |
0.0 |
0.000000 |
65 |
铅钡 |
风化 |
57 |
27.489997 |
0.000000 |
0.000000 |
1.416676 |
0.000000 |
2.357521 |
0.000000 |
1.254461 |
48.772575 |
18.708770 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
66 |
铅钡 |
风化 |
58 |
30.771567 |
0.000000 |
0.344269 |
3.533819 |
0.799919 |
3.564196 |
0.870798 |
3.169299 |
39.844066 |
7.756177 |
9.102876 |
0.243013 |
0.0 |
0.000000 |
67 rows × 17 columns
2.3.2中心化对数比变换效果图
plt.rcParams['font.family'] = 'DejaVu Sans'
fig, axs = plt.subplots(len(selected_cols), 2, figsize=(15, len(selected_cols)*3))
for i, col in enumerate(selected_cols):
axs[i, 0].hist(data_raw[col].dropna(), bins=30, color='skyblue', edgecolor='black', alpha=0.7)
axs[i, 0].set_title(f'Raw data: {col}')
axs[i, 1].hist(data_centralized[col].dropna(), bins=30, color='salmon', edgecolor='black', alpha=0.7)
axs[i, 1].set_title(f'Centralized Log Ratio: {col}')
plt.tight_layout()
plt.show()
data=data_centralized
glass_types = data['类型'].unique()
weathering_states = data['表面风化'].unique()
glass_types, weathering_states
(array(['高钾', '铅钡'], dtype=object), array(['无风化', '风化'], dtype=object))
grouped_stats = pd.DataFrame()
component_cols = ['SiO2', 'Na2O', 'K2O', 'CaO', 'MgO',
'Al2O3', 'Fe2O3', 'CuO', 'PbO', 'BaO',
'P2O5', 'SrO', 'SnO2', 'SO2']
for component in component_cols:
component_data = data.groupby(['类型', '表面风化'])[component]
stats = component_data.agg(['mean', 'max', 'min', 'std', 'var', 'skew'])
stats['kurt'] = component_data.apply(pd.DataFrame.kurt)
stats['cv'] = stats['std'] / stats['mean']
stats.columns = pd.MultiIndex.from_product([[component], stats.columns])
grouped_stats = pd.concat([grouped_stats, stats], axis=1)
grouped_stats
|
|
SiO2 |
Na2O |
... |
SnO2 |
SO2 |
|
|
mean |
max |
min |
std |
var |
skew |
kurt |
cv |
mean |
max |
... |
kurt |
cv |
mean |
max |
min |
std |
var |
skew |
kurt |
cv |
类型 |
表面风化 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
铅钡 |
无风化 |
3.013743 |
3.871521 |
1.859524 |
0.646195 |
0.417567 |
-0.301305 |
-0.956815 |
0.214416 |
0.071131 |
0.876318 |
... |
3.253187 |
-2.441987 |
0.020569 |
0.267396 |
0.000000 |
0.074162 |
0.005500 |
3.605551 |
13.000000 |
3.605551 |
风化 |
2.242329 |
3.937307 |
-0.131353 |
0.923780 |
0.853370 |
-0.584811 |
0.650707 |
0.411973 |
0.013371 |
1.043858 |
... |
13.632917 |
-3.664983 |
0.028021 |
1.369229 |
-0.796562 |
0.336451 |
0.113199 |
2.108909 |
9.857280 |
12.007019 |
高钾 |
无风化 |
3.165687 |
3.712288 |
2.266609 |
0.363205 |
0.131918 |
-1.093726 |
3.036563 |
0.114732 |
-0.013585 |
0.320182 |
... |
12.000000 |
-3.464102 |
-0.507620 |
0.000000 |
-2.240723 |
0.925901 |
0.857292 |
-1.388056 |
-0.011455 |
-1.824002 |
风化 |
4.187045 |
4.372977 |
3.830498 |
0.187388 |
0.035114 |
-1.731995 |
3.641136 |
0.044754 |
0.000000 |
0.000000 |
... |
0.000000 |
NaN |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
NaN |
4 rows × 112 columns文章来源:https://www.toymoban.com/news/detail-663476.html
tables_dict = {}
for glass_type in glass_types:
for weathering_state in weathering_states:
subset = grouped_stats.loc[glass_type, weathering_state].unstack().T
table_name = f"{glass_type}_{weathering_state}"
tables_dict[table_name] = pd.DataFrame(subset)
tables_dict
{'高钾_无风化': Al2O3 BaO CaO CuO Fe2O3 K2O MgO \
cv 0.664393 -1.972230 0.893838 -2.321136 -1.626433 0.473990 -0.700958
kurt -1.409964 3.016385 -0.156702 1.577446 0.472540 1.635379 -1.292382
max 1.508084 0.019074 1.647769 0.595531 0.747950 2.210662 0.000000
mean 0.776104 -0.179823 0.599071 -0.262942 -0.390464 1.145963 -0.674968
min 0.006978 -1.080913 -0.182189 -1.652716 -1.590841 0.000000 -1.332161
skew -0.031480 -1.906416 0.378894 -1.180633 -0.394538 -0.184857 0.061519
std 0.515638 0.354653 0.535473 0.610324 0.635064 0.543175 0.473124
var 0.265882 0.125778 0.286731 0.372495 0.403306 0.295039 0.223846
Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv -19.285768 -0.979906 -1.116780 -1.824002 0.114732 -3.464102 -1.050200
kurt 7.015733 0.317255 -1.629147 -0.011455 3.036563 12.000000 -2.376521
max 0.320182 0.526955 0.000000 0.000000 3.712288 0.000000 0.000000
mean -0.013585 -0.938500 -0.987338 -0.507620 3.165687 -0.007795 -1.723790
min -0.760277 -2.730275 -2.672140 -2.240723 2.266609 -0.093536 -3.774602
skew -2.150622 0.057567 -0.552251 -1.388056 -1.093726 -3.464102 -0.037176
std 0.262001 0.919641 1.102639 0.925901 0.363205 0.027002 1.810324
var 0.068645 0.845740 1.215812 0.857292 0.131918 0.000729 3.277274 ,
'高钾_风化': Al2O3 BaO CaO CuO Fe2O3 K2O MgO Na2O \
cv 2.498627 NaN -0.962261 -8.191497 -0.250545 -0.997049 -1.572791 NaN
kurt 0.025390 0.0 2.287842 0.619598 1.095297 -0.867476 -1.112631 0.0
max 0.961580 0.0 0.215634 0.477459 -1.341006 0.000000 0.000000 0.0
mean 0.194529 0.0 -0.664817 -0.060020 -1.714985 -0.328478 -0.286859 0.0
min -0.410081 0.0 -1.760008 -0.889020 -2.470072 -0.824068 -0.983686 0.0
skew 0.669913 0.0 -0.709483 -1.043688 -1.369695 -0.588570 -1.095736 0.0
std 0.486056 0.0 0.639727 0.491651 0.429681 0.327508 0.451170 0.0
var 0.236251 0.0 0.409251 0.241720 0.184626 0.107262 0.203554 0.0
P2O5 PbO SO2 SiO2 SnO2 SrO
cv -0.562597 NaN NaN 0.044754 NaN NaN
kurt 2.101884 0.0 0.0 3.641136 0.0 0.0
max 0.000000 0.0 0.0 4.372977 0.0 0.0
mean -1.326415 0.0 0.0 4.187045 0.0 0.0
min -2.178840 0.0 0.0 3.830498 0.0 0.0
skew 1.134407 0.0 0.0 -1.731995 0.0 0.0
std 0.746238 0.0 0.0 0.187388 0.0 0.0
var 0.556871 0.0 0.0 0.035114 0.0 0.0 ,
'铅钡_无风化': Al2O3 BaO CaO CuO Fe2O3 K2O MgO \
cv 3.716292 0.352188 -0.987216 -1.103642 -2.376125 -0.899079 -1.163923
kurt 0.214284 1.405046 -0.671685 -0.661301 4.165086 -1.951127 -0.717171
max 0.901223 2.031090 0.340114 0.899535 0.554504 0.000000 0.000000
mean 0.138882 1.245669 -0.714861 -0.925721 -0.306467 -1.288085 -0.541147
min -0.847107 0.260264 -1.990837 -2.580097 -2.264904 -2.915489 -1.822866
skew -0.716711 -0.562582 0.062455 0.086620 -1.989760 0.104047 -0.750761
std 0.516125 0.438710 0.705723 1.021664 0.728205 1.158091 0.629853
var 0.266385 0.192466 0.498044 1.043798 0.530282 1.341175 0.396715
Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv 3.684555 -0.818040 0.266446 3.605551 0.214416 -2.441987 -0.893422
kurt 8.623783 -1.684970 6.556376 13.000000 -0.956815 3.253187 -2.023534
max 0.876318 0.000000 2.610837 0.267396 3.871521 0.000000 0.000000
mean 0.071131 -1.449052 2.160856 0.020569 3.013743 -0.311426 -1.114090
min -0.221722 -3.201927 0.468937 0.000000 1.859524 -2.078030 -2.211561
skew 2.741762 -0.069394 -2.363412 3.605551 -0.301305 -2.182647 0.129023
std 0.262087 1.185383 0.575751 0.074162 0.646195 0.760497 0.995352
var 0.068690 1.405133 0.331490 0.005500 0.417567 0.578356 0.990726 ,
'铅钡_风化': Al2O3 BaO CaO CuO Fe2O3 K2O MgO \
cv -11.231984 0.609170 -1.725044 -1.063677 -1.111626 -1.185000 -0.978333
kurt -0.288489 -0.601793 -0.712685 -0.599524 -0.418132 -1.660810 -1.486424
max 2.042802 2.167893 0.497358 0.888513 0.000000 0.000000 0.000000
mean -0.087576 1.035546 -0.375654 -0.824426 -0.723172 -0.967980 -0.693444
min -1.826182 -0.181275 -1.877738 -2.764779 -2.575747 -2.970023 -1.841063
skew 0.155720 -0.126683 -0.661062 0.043889 -0.837263 -0.456970 -0.340389
std 0.983655 0.630823 0.648019 0.876923 0.803897 1.147056 0.678420
var 0.967578 0.397938 0.419929 0.768995 0.646251 1.315737 0.460253
Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv 27.248350 -11.063796 0.221513 12.007019 0.411973 -3.664983 -0.415568
kurt 3.996993 1.120939 -0.753508 9.857280 0.650707 13.632917 1.634188
max 1.043858 1.188784 3.510396 1.369229 3.937307 0.000000 0.000000
mean 0.013371 -0.102296 2.402080 0.028021 2.242329 -0.119384 -1.827413
min -1.093837 -3.229330 1.389649 -0.796562 -0.131353 -1.944122 -2.930869
skew -0.038016 -1.253158 0.239303 2.108909 -0.584811 -3.788951 1.424906
std 0.364329 1.131785 0.532092 0.336451 0.923780 0.437542 0.759414
var 0.132736 1.280938 0.283122 0.113199 0.853370 0.191443 0.576710 }
'''
with pd.ExcelWriter('E:\\数学建模国赛\\2022数学建模赛题\\C题\\一二表单合并数据统计性分析.xlsx') as writer:
for sheet_name, df in tables_dict.items():
df.to_excel(writer, sheet_name=sheet_name,index=True)
'''
"\nwith pd.ExcelWriter('E:\\数学建模国赛\\2022数学建模赛题\\C题\\一二表单合并数据统计性分析.xlsx') as writer:\n for sheet_name, df in tables_dict.items():\n df.to_excel(writer, sheet_name=sheet_name,index=True)\n"
2.4描述性统计
tables_dict['高钾_无风化']
|
Al2O3 |
BaO |
CaO |
CuO |
Fe2O3 |
K2O |
MgO |
Na2O |
P2O5 |
PbO |
SO2 |
SiO2 |
SnO2 |
SrO |
cv |
0.664393 |
-1.972230 |
0.893838 |
-2.321136 |
-1.626433 |
0.473990 |
-0.700958 |
-19.285768 |
-0.979906 |
-1.116780 |
-1.824002 |
0.114732 |
-3.464102 |
-1.050200 |
kurt |
-1.409964 |
3.016385 |
-0.156702 |
1.577446 |
0.472540 |
1.635379 |
-1.292382 |
7.015733 |
0.317255 |
-1.629147 |
-0.011455 |
3.036563 |
12.000000 |
-2.376521 |
max |
1.508084 |
0.019074 |
1.647769 |
0.595531 |
0.747950 |
2.210662 |
0.000000 |
0.320182 |
0.526955 |
0.000000 |
0.000000 |
3.712288 |
0.000000 |
0.000000 |
mean |
0.776104 |
-0.179823 |
0.599071 |
-0.262942 |
-0.390464 |
1.145963 |
-0.674968 |
-0.013585 |
-0.938500 |
-0.987338 |
-0.507620 |
3.165687 |
-0.007795 |
-1.723790 |
min |
0.006978 |
-1.080913 |
-0.182189 |
-1.652716 |
-1.590841 |
0.000000 |
-1.332161 |
-0.760277 |
-2.730275 |
-2.672140 |
-2.240723 |
2.266609 |
-0.093536 |
-3.774602 |
skew |
-0.031480 |
-1.906416 |
0.378894 |
-1.180633 |
-0.394538 |
-0.184857 |
0.061519 |
-2.150622 |
0.057567 |
-0.552251 |
-1.388056 |
-1.093726 |
-3.464102 |
-0.037176 |
std |
0.515638 |
0.354653 |
0.535473 |
0.610324 |
0.635064 |
0.543175 |
0.473124 |
0.262001 |
0.919641 |
1.102639 |
0.925901 |
0.363205 |
0.027002 |
1.810324 |
var |
0.265882 |
0.125778 |
0.286731 |
0.372495 |
0.403306 |
0.295039 |
0.223846 |
0.068645 |
0.845740 |
1.215812 |
0.857292 |
0.131918 |
0.000729 |
3.277274 |
tables_dict['高钾_风化']
|
Al2O3 |
BaO |
CaO |
CuO |
Fe2O3 |
K2O |
MgO |
Na2O |
P2O5 |
PbO |
SO2 |
SiO2 |
SnO2 |
SrO |
cv |
2.498627 |
NaN |
-0.962261 |
-8.191497 |
-0.250545 |
-0.997049 |
-1.572791 |
NaN |
-0.562597 |
NaN |
NaN |
0.044754 |
NaN |
NaN |
kurt |
0.025390 |
0.0 |
2.287842 |
0.619598 |
1.095297 |
-0.867476 |
-1.112631 |
0.0 |
2.101884 |
0.0 |
0.0 |
3.641136 |
0.0 |
0.0 |
max |
0.961580 |
0.0 |
0.215634 |
0.477459 |
-1.341006 |
0.000000 |
0.000000 |
0.0 |
0.000000 |
0.0 |
0.0 |
4.372977 |
0.0 |
0.0 |
mean |
0.194529 |
0.0 |
-0.664817 |
-0.060020 |
-1.714985 |
-0.328478 |
-0.286859 |
0.0 |
-1.326415 |
0.0 |
0.0 |
4.187045 |
0.0 |
0.0 |
min |
-0.410081 |
0.0 |
-1.760008 |
-0.889020 |
-2.470072 |
-0.824068 |
-0.983686 |
0.0 |
-2.178840 |
0.0 |
0.0 |
3.830498 |
0.0 |
0.0 |
skew |
0.669913 |
0.0 |
-0.709483 |
-1.043688 |
-1.369695 |
-0.588570 |
-1.095736 |
0.0 |
1.134407 |
0.0 |
0.0 |
-1.731995 |
0.0 |
0.0 |
std |
0.486056 |
0.0 |
0.639727 |
0.491651 |
0.429681 |
0.327508 |
0.451170 |
0.0 |
0.746238 |
0.0 |
0.0 |
0.187388 |
0.0 |
0.0 |
var |
0.236251 |
0.0 |
0.409251 |
0.241720 |
0.184626 |
0.107262 |
0.203554 |
0.0 |
0.556871 |
0.0 |
0.0 |
0.035114 |
0.0 |
0.0 |
tables_dict['铅钡_无风化']
|
Al2O3 |
BaO |
CaO |
CuO |
Fe2O3 |
K2O |
MgO |
Na2O |
P2O5 |
PbO |
SO2 |
SiO2 |
SnO2 |
SrO |
cv |
3.716292 |
0.352188 |
-0.987216 |
-1.103642 |
-2.376125 |
-0.899079 |
-1.163923 |
3.684555 |
-0.818040 |
0.266446 |
3.605551 |
0.214416 |
-2.441987 |
-0.893422 |
kurt |
0.214284 |
1.405046 |
-0.671685 |
-0.661301 |
4.165086 |
-1.951127 |
-0.717171 |
8.623783 |
-1.684970 |
6.556376 |
13.000000 |
-0.956815 |
3.253187 |
-2.023534 |
max |
0.901223 |
2.031090 |
0.340114 |
0.899535 |
0.554504 |
0.000000 |
0.000000 |
0.876318 |
0.000000 |
2.610837 |
0.267396 |
3.871521 |
0.000000 |
0.000000 |
mean |
0.138882 |
1.245669 |
-0.714861 |
-0.925721 |
-0.306467 |
-1.288085 |
-0.541147 |
0.071131 |
-1.449052 |
2.160856 |
0.020569 |
3.013743 |
-0.311426 |
-1.114090 |
min |
-0.847107 |
0.260264 |
-1.990837 |
-2.580097 |
-2.264904 |
-2.915489 |
-1.822866 |
-0.221722 |
-3.201927 |
0.468937 |
0.000000 |
1.859524 |
-2.078030 |
-2.211561 |
skew |
-0.716711 |
-0.562582 |
0.062455 |
0.086620 |
-1.989760 |
0.104047 |
-0.750761 |
2.741762 |
-0.069394 |
-2.363412 |
3.605551 |
-0.301305 |
-2.182647 |
0.129023 |
std |
0.516125 |
0.438710 |
0.705723 |
1.021664 |
0.728205 |
1.158091 |
0.629853 |
0.262087 |
1.185383 |
0.575751 |
0.074162 |
0.646195 |
0.760497 |
0.995352 |
var |
0.266385 |
0.192466 |
0.498044 |
1.043798 |
0.530282 |
1.341175 |
0.396715 |
0.068690 |
1.405133 |
0.331490 |
0.005500 |
0.417567 |
0.578356 |
0.990726 |
tables_dict['铅钡_风化']
|
Al2O3 |
BaO |
CaO |
CuO |
Fe2O3 |
K2O |
MgO |
Na2O |
P2O5 |
PbO |
SO2 |
SiO2 |
SnO2 |
SrO |
cv |
-11.231984 |
0.609170 |
-1.725044 |
-1.063677 |
-1.111626 |
-1.185000 |
-0.978333 |
27.248350 |
-11.063796 |
0.221513 |
12.007019 |
0.411973 |
-3.664983 |
-0.415568 |
kurt |
-0.288489 |
-0.601793 |
-0.712685 |
-0.599524 |
-0.418132 |
-1.660810 |
-1.486424 |
3.996993 |
1.120939 |
-0.753508 |
9.857280 |
0.650707 |
13.632917 |
1.634188 |
max |
2.042802 |
2.167893 |
0.497358 |
0.888513 |
0.000000 |
0.000000 |
0.000000 |
1.043858 |
1.188784 |
3.510396 |
1.369229 |
3.937307 |
0.000000 |
0.000000 |
mean |
-0.087576 |
1.035546 |
-0.375654 |
-0.824426 |
-0.723172 |
-0.967980 |
-0.693444 |
0.013371 |
-0.102296 |
2.402080 |
0.028021 |
2.242329 |
-0.119384 |
-1.827413 |
min |
-1.826182 |
-0.181275 |
-1.877738 |
-2.764779 |
-2.575747 |
-2.970023 |
-1.841063 |
-1.093837 |
-3.229330 |
1.389649 |
-0.796562 |
-0.131353 |
-1.944122 |
-2.930869 |
skew |
0.155720 |
-0.126683 |
-0.661062 |
0.043889 |
-0.837263 |
-0.456970 |
-0.340389 |
-0.038016 |
-1.253158 |
0.239303 |
2.108909 |
-0.584811 |
-3.788951 |
1.424906 |
std |
0.983655 |
0.630823 |
0.648019 |
0.876923 |
0.803897 |
1.147056 |
0.678420 |
0.364329 |
1.131785 |
0.532092 |
0.336451 |
0.923780 |
0.437542 |
0.759414 |
var |
0.967578 |
0.397938 |
0.419929 |
0.768995 |
0.646251 |
1.315737 |
0.460253 |
0.132736 |
1.280938 |
0.283122 |
0.113199 |
0.853370 |
0.191443 |
0.576710 |
'''
均值(Mean):
SiO2(二氧化硅): 在未风化的玻璃中,高钾玻璃的SiO2含量均值显著高于铅钡玻璃。
然而,风化过程中,两者的差异缩小,可能表明风化过程影响了SiO2的含量。
Al2O3(氧化铝): 未风化的玻璃中,高钾玻璃的氧化铝含量均值大于铅钡玻璃。
风化后,铅钡玻璃的氧化铝含量均值超过高钾玻璃,这可能反映了风化对氧化铝的显著影响。
标准差(Std)和变异系数(CV):
Na2O(氧化钠): 未风化玻璃中,铅钡玻璃的氧化钠含量均值较高,但风化后,高钾玻璃的氧化钠含量均值增加。
这可能表明风化过程改变了氧化钠的分布。
CaO(氧化钙): 在所有条件下,铅钡玻璃的氧化钙含量均值均大于高钾玻璃,反映了铅钡玻璃的特有组成。
偏度(Skew)和峰度(Kurt):
PbO(氧化铅)和BaO(氧化钡): 在高钾和铅钡玻璃之间,这些成分的分布偏度和峰度存在显著差异。
这可能反映了不同类型玻璃的结构差异和风化过程的不同影响。
特定元素观察:
二氧化硅 (SiO2): 未风化的高钾玻璃的二氧化硅含量约为铅钡玻璃的两倍,但风化后,两者的差异减小。
这可能反映了风化对二氧化硅含量的影响。
氧化铝 (Al2O3): 风化可能对氧化铝含量有显著影响,特别是在铅钡玻璃中。
'''
'\n均值(Mean):\nSiO2(二氧化硅): 在未风化的玻璃中,高钾玻璃的SiO2含量均值显著高于铅钡玻璃。\n然而,风化过程中,两者的差异缩小,可能表明风化过程影响了SiO2的含量。\nAl2O3(氧化铝): 未风化的玻璃中,高钾玻璃的氧化铝含量均值大于铅钡玻璃。\n风化后,铅钡玻璃的氧化铝含量均值超过高钾玻璃,这可能反映了风化对氧化铝的显著影响。\n标准差(Std)和变异系数(CV):\nNa2O(氧化钠): 未风化玻璃中,铅钡玻璃的氧化钠含量均值较高,但风化后,高钾玻璃的氧化钠含量均值增加。\n这可能表明风化过程改变了氧化钠的分布。\nCaO(氧化钙): 在所有条件下,铅钡玻璃的氧化钙含量均值均大于高钾玻璃,反映了铅钡玻璃的特有组成。\n偏度(Skew)和峰度(Kurt):\nPbO(氧化铅)和BaO(氧化钡): 在高钾和铅钡玻璃之间,这些成分的分布偏度和峰度存在显著差异。\n这可能反映了不同类型玻璃的结构差异和风化过程的不同影响。\n特定元素观察:\n二氧化硅 (SiO2): 未风化的高钾玻璃的二氧化硅含量约为铅钡玻璃的两倍,但风化后,两者的差异减小。\n这可能反映了风化对二氧化硅含量的影响。\n氧化铝 (Al2O3): 风化可能对氧化铝含量有显著影响,特别是在铅钡玻璃中。\n'
2.5 箱线图绘制
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['font.family'] = 'DejaVu Sans'
data_high_potassium_erosion = data[(data['类型'] == '高钾') & (data['表面风化'] == '风化')]
data_high_potassium_no_erosion = data[(data['类型'] == '高钾') & (data['表面风化'] == '无风化')]
data_lead_barium_erosion = data[(data['类型'] == '铅钡') & (data['表面风化'] == '风化')]
data_lead_barium_no_erosion = data[(data['类型'] == '铅钡') & (data['表面风化'] == '无风化')]
boxplot_data_high_potassium_erosion = data_high_potassium_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
boxplot_data_high_potassium_no_erosion = data_high_potassium_no_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
boxplot_data_lead_barium_erosion = data_lead_barium_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
boxplot_data_lead_barium_no_erosion = data_lead_barium_no_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
plt.figure(figsize=(20, 45))
fig, axs = plt.subplots(2, 2, figsize=(12, 8))
data_list = [boxplot_data_lead_barium_erosion, boxplot_data_lead_barium_no_erosion, boxplot_data_high_potassium_erosion, boxplot_data_high_potassium_no_erosion]
titles = ['Lead Barium Glass with Erosion', 'Lead Barium Glass without Erosion', 'High Potassium Glass with Erosion', 'High Potassium Glass without Erosion']
for ax, data, title in zip(axs.flatten(), data_list, titles):
sns.boxplot(y='variable', x='value', data=data, ax=ax, orient="h")
ax.set_ylabel('Chemical Component')
ax.set_xlabel('Content (%)')
ax.set_title('{}'.format(title))
ax.invert_yaxis()
plt.tight_layout()
plt.show()
<Figure size 2000x4500 with 0 Axes>
文章来源地址https://www.toymoban.com/news/detail-663476.html
'''
铅钡玻璃:
风化前后的变化:
中位数下降: 大部分化学成分的中位数在风化过程中有所下降,特别是Al2O3、K2O、SiO2、CaO、MgO、Na2O。
这可能反映了风化过程中这些成分的流失。
离散程度下降: 这些成分的离散程度也在风化过程中减小,表明风化可能使这些成分的含量变得更一致。
特定化学成分观察:
铝氧化物 (Al2O3): 风化使铝氧化物的中位数下降,分布变得更紧密。
硅氧化物 (SiO2): 风化使硅氧化物的中位数下降,分布也更紧密。
钾氧化物 (K2O) 和 钠氧化物 (Na2O): 分布变得更紧密,中位数下降。
高钾玻璃:
风化前后的变化:
中位数下降: 大部分化学成分的中位数也在风化过程中下降,尤其是K2O和Na2O,与铅钡玻璃相似。
离散程度变化: 不同于铅钡玻璃,某些化学成分的分布在风化后变得更广,例如硅氧化物 (SiO2) 和钾氧化物 (K2O)。
特定化学成分观察:
铝氧化物 (Al2O3): 高钾玻璃的铝氧化物分布在风化后变得更广泛。
硅氧化物 (SiO2): 风化过程似乎没有明显改变硅氧化物的中位数,但分布变得更广。
钾氧化物 (K2O) 和 钠氧化物 (Na2O): 中位数大幅下降,分布更广。
总结:
通过这些箱线图,我们可以观察到风化过程对玻璃成分的具体影响。
对于铅钡玻璃和高钾玻璃,风化过程都可能导致某些成分的流失,但具体的影响程度可能会因为玻璃的类型和成分的种类而有所不同。
这些观察有助于我们理解风化对不同类型玻璃化学成分的影响,进而为文物保护和修复提供指导。
'''
'\n铅钡玻璃:\n风化前后的变化:\n\n中位数下降: 大部分化学成分的中位数在风化过程中有所下降,特别是Al2O3、K2O、SiO2、CaO、MgO、Na2O。\n这可能反映了风化过程中这些成分的流失。\n离散程度下降: 这些成分的离散程度也在风化过程中减小,表明风化可能使这些成分的含量变得更一致。\n特定化学成分观察:\n\n铝氧化物 (Al2O3): 风化使铝氧化物的中位数下降,分布变得更紧密。\n硅氧化物 (SiO2): 风化使硅氧化物的中位数下降,分布也更紧密。\n钾氧化物 (K2O) 和 钠氧化物 (Na2O): 分布变得更紧密,中位数下降。\n高钾玻璃:\n风化前后的变化:\n\n中位数下降: 大部分化学成分的中位数也在风化过程中下降,尤其是K2O和Na2O,与铅钡玻璃相似。\n离散程度变化: 不同于铅钡玻璃,某些化学成分的分布在风化后变得更广,例如硅氧化物 (SiO2) 和钾氧化物 (K2O)。\n特定化学成分观察:\n\n铝氧化物 (Al2O3): 高钾玻璃的铝氧化物分布在风化后变得更广泛。\n硅氧化物 (SiO2): 风化过程似乎没有明显改变硅氧化物的中位数,但分布变得更广。\n钾氧化物 (K2O) 和 钠氧化物 (Na2O): 中位数大幅下降,分布更广。\n总结:\n通过这些箱线图,我们可以观察到风化过程对玻璃成分的具体影响。\n对于铅钡玻璃和高钾玻璃,风化过程都可能导致某些成分的流失,但具体的影响程度可能会因为玻璃的类型和成分的种类而有所不同。\n这些观察有助于我们理解风化对不同类型玻璃化学成分的影响,进而为文物保护和修复提供指导。\n'
到了这里,关于2022数学建模国赛C题官网展示论文C155论文复现的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!