1 前期准备
为了方便表述,我们先做一些很简单的定义:
假设有一多项式函数:
f
(
x
1
,
x
2
,
⋯
,
x
m
)
=
∑
i
=
1
m
a
i
x
i
f( x_1,x_2,\cdots ,x_m) =\sum_{i=1}^m{a_ix_i}
f(x1,x2,⋯,xm)=i=1∑maixi
我们将函数中的自变量都提取出来组成一个列向量
x
x
x:
x
=
[
x
1
,
x
2
,
⋯
,
x
m
]
T
x=[x_1,x_2,\cdots,x_m]^T
x=[x1,x2,⋯,xm]T
则称
x
x
x为一个向量变元。
如 [ x 1 , x 2 ] T [x_1,x_2]^T [x1,x2]T就是 f ( x 1 , x 2 ) = x 1 + 2 x 2 f(x_1,x_2)=x_1+2x_2 f(x1,x2)=x1+2x2的向量变元
此时,如果我们按照向量变元内部的变量排列顺序,依次在每个变量位置填上该变量对应的偏导函数,则就构成了对于函数
f
(
x
1
,
x
2
,
⋯
,
x
m
)
f( x_1,x_2,\cdots ,x_m)
f(x1,x2,⋯,xm)进行向量变元
x
x
x的向量求导的结果,即:
∂
f
(
x
1
,
x
2
,
⋯
,
x
m
)
∂
x
=
[
∂
f
(
x
1
,
x
2
,
⋯
,
x
m
)
∂
x
1
,
∂
f
(
x
1
,
x
2
,
⋯
,
x
m
)
∂
x
2
,
⋯
,
∂
f
(
x
1
,
x
2
,
⋯
,
x
m
)
∂
x
m
]
T
\frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x}=[ \frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x_1},\frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x_2},\cdots ,\frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x_m} ] ^T
∂x∂f(x1,x2,⋯,xm)=[∂x1∂f(x1,x2,⋯,xm),∂x2∂f(x1,x2,⋯,xm),⋯,∂xm∂f(x1,x2,⋯,xm)]T
据此,我们对向量求导做出定义:
设 f ( x ) f(x) f(x)是一个关于 x x x的函数,其中 x x x是向量变元,并且 x = [ x 1 , x 2 , . . . , x n ] T x = [x_1, x_2,...,x_n]^T x=[x1,x2,...,xn]T
则
∂
f
∂
x
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
.
.
.
,
∂
f
∂
x
n
]
T
\frac{\partial f}{\partial x} = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}]^T
∂x∂f=[∂x1∂f,∂x2∂f,...,∂xn∂f]T
而该表达式也被称为向量求导的梯度向量形式。
∇
x
f
(
x
)
=
∂
f
∂
x
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
.
.
.
,
∂
f
∂
x
n
]
T
\nabla _xf(x) = \frac{\partial f}{\partial x} = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}]^T
∇xf(x)=∂x∂f=[∂x1∂f,∂x2∂f,...,∂xn∂f]T
接下来,我们去证明几个等式,这些等式都将再最小二乘法的矩阵形式推导中用到。
等式一:
∂ a ∂ x = 0 \frac{\partial a}{\partial x} = 0 ∂x∂a=0
证明:
∂ a ∂ x = [ ∂ a ∂ x 1 , ∂ a ∂ x 2 , . . . , ∂ a ∂ x n ] T = [ 0 , 0 , . . . , 0 ] T \frac{\partial a}{\partial x} = [\frac{\partial a}{\partial x_1}, \frac{\partial a}{\partial x_2}, ..., \frac{\partial a}{\partial x_n}]^T = [0,0,...,0]^T ∂x∂a=[∂x1∂a,∂x2∂a,...,∂xn∂a]T=[0,0,...,0]T
等式二:
∂ ( x T ⋅ A ) ∂ x = ∂ ( A T ⋅ x ) ∂ x = A \frac{\partial(x^T \cdot A)}{\partial x} = \frac{\partial(A^T \cdot x)}{\partial x} = A ∂x∂(xT⋅A)=∂x∂(AT⋅x)=A
证明:设 A = [ a 1 , a 2 , . . . , a n ] T A = [a_1, a_2,...,a_n]^T A=[a1,a2,...,an]T,则有:
∂ ( x T ⋅ A ) ∂ x = ∂ ( A T ⋅ x ) ∂ x = ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x = [ ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x 1 ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x 2 . . . ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x n ] = [ a 1 a 2 . . . a n ] = A \begin{aligned} \frac{\partial(x^T \cdot A)}{\partial x} & = \frac{\partial(A^T \cdot x)}{\partial x}\\ & = \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x}\\ & = \left [\begin{array}{cccc} \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x_1} \\ \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x_2} \\ . \\ . \\ . \\ \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x_n} \\ \end{array}\right] \\ & =\left [\begin{array}{cccc} a_1 \\ a_2 \\ . \\ . \\ . \\ a_n \\ \end{array}\right] = A \end{aligned} ∂x∂(xT⋅A)=∂x∂(AT⋅x)=∂x∂(a1⋅x1+a2⋅x2+...+an⋅xn)=⎣⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂(a1⋅x1+a2⋅x2+...+an⋅xn)∂x2∂(a1⋅x1+a2⋅x2+...+an⋅xn)...∂xn∂(a1⋅x1+a2⋅x2+...+an⋅xn)⎦⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎡a1a2...an⎦⎥⎥⎥⎥⎥⎥⎤=A
等式三:
∂ ( x T ⋅ x ) ∂ x = 2 x \frac{\partial (x^T \cdot x)}{\partial x} = 2x ∂x∂(xT⋅x)=2x
证明:
∂ ( x T ⋅ x ) ∂ x = ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x = [ ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x 1 ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x 2 . . . ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x n ] = [ 2 x 1 2 x 2 . . . 2 x n ] = 2 x \begin{aligned} \frac{\partial(x^T \cdot x)}{\partial x} & = \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x}\\ & = \left [\begin{array}{cccc} \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x_1} \\ \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x_2} \\ . \\ . \\ . \\ \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x_n} \\ \end{array}\right] \\ & =\left [\begin{array}{cccc} 2x_1 \\ 2x_2 \\ . \\ . \\ . \\ 2x_n \\ \end{array}\right] = 2x \end{aligned} ∂x∂(xT⋅x)=∂x∂(x12+x22+...+xn2)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂(x12+x22+...+xn2)∂x2∂(x12+x22+...+xn2)...∂xn∂(x12+x22+...+xn2)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎡2x12x2...2xn⎦⎥⎥⎥⎥⎥⎥⎤=2x此处 x T x x^Tx xTx也被称为向量的交叉乘积(crossprod)
等式四:
∂ ( x T A x ) x = A x + A T x \frac{\partial (x^T A x)}{x} = Ax + A^Tx x∂(xTAx)=Ax+ATx
证明:首先:
X T A X = [ x 1 , x 2 , . . . , x n ] ⋅ [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . . . . . . . . . . a n 1 a n 2 . . . a n n ] ⋅ [ x 1 , x 2 , . . . , x n ] T = [ x 1 a 11 + x 2 a 21 + . . . + x n a n 1 , x 1 a 12 + x 2 a 22 + . . . + x n a n 2 , . . . , x 1 a 1 n + x 2 a 2 n + . . . + x n a n n ] ⋅ [ x 1 x 2 . . . x n ] = x 1 ( x 1 a 11 + x 2 a 21 + . . . + x n a n 1 ) + x 2 ( x 1 a 12 + x 2 a 22 + . . . + x n a n 2 ) + . . . + x n ( x 1 a 1 n + x 2 a 2 n + . . . + x n a n n ) \begin{aligned} X^TAX &= [x_1, x_2,...,x_n] \cdot \left [\begin{array}{cccc} a_{11} &a_{12} &... &a_{1n}\\ a_{21} &a_{22} &... &a_{2n}\\ ... &... &... &... \\ a_{n1} &a_{n2} &... &a_{nn}\\ \end{array}\right] \cdot [x_1, x_2,...,x_n]^T \\ &=[x_1a_{11}+x_2a_{21}+...+x_na_{n1}, x_1a_{12}+x_2a_{22}+...+x_na_{n2},...,x_1a_{1n}+x_2a_{2n}+...+x_na_{nn}] \cdot \left [\begin{array}{cccc} x_1 \\ x_2 \\ . \\ . \\ . \\ x_n \\ \end{array}\right] \\ &=x_1(x_1a_{11}+x_2a_{21}+...+x_na_{n1})+x_2(x_1a_{12}+x_2a_{22}+...+x_na_{n2})+...+x_n(x_1a_{1n}+x_2a_{2n}+...+x_na_{nn}) \end{aligned} XTAX=[x1,x2,...,xn]⋅⎣⎢⎢⎡a11a21...an1a12a22...an2............a1na2n...ann⎦⎥⎥⎤⋅[x1,x2,...,xn]T=[x1a11+x2a21+...+xnan1,x1a12+x2a22+...+xnan2,...,x1a1n+x2a2n+...+xnann]⋅⎣⎢⎢⎢⎢⎢⎢⎡x1x2...xn⎦⎥⎥⎥⎥⎥⎥⎤=x1(x1a11+x2a21+...+xnan1)+x2(x1a12+x2a22+...+xnan2)+...+xn(x1a1n+x2a2n+...+xnann)
令:
k ( x ) = x 1 ( x 1 a 11 + x 2 a 21 + . . . + x n a n 1 ) + x 2 ( x 1 a 12 + x 2 a 22 + . . . + x n a n 2 ) + . . . + x n ( x 1 a 1 n + x 2 a 2 n + . . . + x n a n n ) k(x) = x_1(x_1a_{11}+x_2a_{21}+...+x_na_{n1})+x_2(x_1a_{12}+x_2a_{22}+...+x_na_{n2})+...+x_n(x_1a_{1n}+x_2a_{2n}+...+x_na_{nn}) k(x)=x1(x1a11+x2a21+...+xnan1)+x2(x1a12+x2a22+...+xnan2)+...+xn(x1a1n+x2a2n+...+xnann)
则:
∂ k ( x ) ∂ x 1 = ( x 1 a 11 + x 2 a 21 + . . . + x n a n 1 ) + ( x 1 a 11 + x 2 a 12 + . . . + x n a 1 n ) \frac{\partial k(x)}{\partial x_1} = (x_1a_{11}+x_2a_{21}+...+x_na_{n1})+ (x_1a_{11} + x_2a_{12}+...+x_na_{1n}) ∂x1∂k(x)=(x1a11+x2a21+...+xnan1)+(x1a11+x2a12+...+xna1n)
所以:
2 最小二乘法矩阵形式推导过程
假设有一多元线性方程组:
f
(
x
)
=
w
1
x
1
+
w
2
x
2
+
.
.
.
+
w
d
x
d
+
b
f(x) = w_1x_1+w_2x_2+...+w_dx_d+b
f(x)=w1x1+w2x2+...+wdxd+b
令
w
=
[
w
1
,
w
2
,
.
.
.
w
d
]
T
w = [w_1,w_2,...w_d]^T
w=[w1,w2,...wd]T,
x
=
[
x
1
,
x
2
,
.
.
.
x
d
]
T
x = [x_1,x_2,...x_d]^T
x=[x1,x2,...xd]T,则上式可写为:
f
(
x
)
=
w
T
x
+
b
f(x) = w^Tx+b
f(x)=wTx+b
但是上式还不够简洁,我们可以令:
w
^
=
[
w
1
,
w
2
,
.
.
.
,
w
d
,
b
]
T
x
^
=
[
x
1
,
x
2
,
.
.
.
,
x
d
,
1
]
T
\hat w = [w_1,w_2,...,w_d,b]^T\\ \hat x = [x_1,x_2,...,x_d,1]^T
w^=[w1,w2,...,wd,b]Tx^=[x1,x2,...,xd,1]T
假设现在总共有
m
m
m条观测值(
m
>
d
m>d
m>d),
x
(
i
)
=
[
x
1
(
i
)
,
x
2
(
i
)
,
.
.
.
,
x
d
(
i
)
]
x^{(i)} = [x_1^{(i)}, x_2^{(i)},...,x_d^{(i)}]
x(i)=[x1(i),x2(i),...,xd(i)],则带入
f
(
x
)
f(x)
f(x)中可构成
m
m
m个方程:
再令:
所以方程组可写作:
X
^
⋅
w
^
=
y
^
\hat X \cdot \hat w = \hat y
X^⋅w^=y^
该线性模型也可写作:
f
(
x
^
)
=
w
^
T
⋅
x
^
f(\hat x) = \hat w^T \cdot \hat x
f(x^)=w^T⋅x^
我们可建立使误差平方和
S
S
E
SSE
SSE最小的优化模型:
min
S
(
w
^
)
=
∣
∣
y
−
X
w
^
∣
∣
2
2
=
(
y
−
X
w
^
)
T
(
y
−
X
w
^
)
\min S(\hat w) = ||y - X\hat w||_2^2 = (y - X\hat w)^T(y - X\hat w)
minS(w^)=∣∣y−Xw^∣∣22=(y−Xw^)T(y−Xw^)
上式中,
∣
∣
y
−
X
w
^
∣
∣
2
||y - X\hat w||_2
∣∣y−Xw^∣∣2为向量的2-范数的计算表达式。向量的2-范数计算过程为各分量求平方和再进行开平方。例如
a
=
[
1
,
−
1
,
]
a=[1, -1,]
a=[1,−1,],则
∣
∣
a
∣
∣
2
=
1
2
+
(
−
1
)
2
=
2
||a||_2= \sqrt{1^2+(-1)^2}=\sqrt{2}
∣∣a∣∣2=12+(−1)2=2。
我们只需要求得偏导数的零点,即可得到最优解,即最优的 w ^ \hat w w^值,即拟合的参数,即可得拟合的多元函数表达式
在此之前,需要补充两点矩阵转置的运算规则:
(
A
−
B
)
T
=
A
T
−
B
T
(
A
B
)
T
=
B
T
A
T
(A-B)^T=A^T-B^T\\ (AB)^T=B^TA^T
(A−B)T=AT−BT(AB)T=BTAT
对
S
(
w
^
)
S(\hat w)
S(w^)求导并令其为0即可:
S
(
w
^
)
∂
w
^
=
∂
∣
∣
y
−
X
w
^
∣
∣
2
2
∂
w
^
=
∂
(
y
−
X
w
^
)
T
(
y
−
X
w
^
)
∂
w
^
=
∂
(
y
T
−
w
^
T
X
T
)
(
y
−
X
w
^
)
∂
w
^
=
∂
(
y
T
y
−
w
^
T
X
T
y
−
y
T
X
w
^
+
w
^
T
X
T
X
w
^
)
∂
w
^
=
0
−
X
T
y
−
X
T
y
+
X
T
X
w
^
+
(
X
T
X
)
T
w
^
=
0
−
X
T
y
−
X
T
y
+
2
X
T
X
w
^
=
2
(
X
T
X
w
^
−
X
T
y
)
=
0
\begin{aligned} \frac{S(\hat w)}{\partial{\boldsymbol{\hat w}}} &= \frac{\partial{||\boldsymbol{y} - \boldsymbol{X\hat w}||_2}^2}{\partial{\boldsymbol{\hat w}}} \\ &= \frac{\partial(\boldsymbol{y} - \boldsymbol{X\hat w})^T(\boldsymbol{y} - \boldsymbol{X\hat w})}{\partial{\boldsymbol{\hat w}}} \\ & =\frac{\partial(\boldsymbol{y}^T - \boldsymbol{\hat w^T X^T})(\boldsymbol{y} - \boldsymbol{X\hat w})}{\partial{\boldsymbol{\hat w}}}\\ &=\frac{\partial(\boldsymbol{y}^T\boldsymbol{y} - \boldsymbol{\hat w^T X^Ty}-\boldsymbol{y}^T\boldsymbol{X \hat w} +\boldsymbol{\hat w^TX^T}\boldsymbol{X\hat w})}{\partial{\boldsymbol{\hat w}}}\\ & = 0 - \boldsymbol{X^Ty} - \boldsymbol{X^Ty}+X^TX\hat w+(X^TX)^T\hat w \\ &= 0 - \boldsymbol{X^Ty} - \boldsymbol{X^Ty} + 2\boldsymbol{X^TX\hat w}\\ &= 2(\boldsymbol{X^TX\hat w} - \boldsymbol{X^Ty}) = 0 \end{aligned}
∂w^S(w^)=∂w^∂∣∣y−Xw^∣∣22=∂w^∂(y−Xw^)T(y−Xw^)=∂w^∂(yT−w^TXT)(y−Xw^)=∂w^∂(yTy−w^TXTy−yTXw^+w^TXTXw^)=0−XTy−XTy+XTXw^+(XTX)Tw^=0−XTy−XTy+2XTXw^=2(XTXw^−XTy)=0
即:
X
T
X
w
^
=
X
T
y
X^TX\hat w = X^Ty
XTXw^=XTy
若
X
T
X
X^TX
XTX存在逆矩阵,则:
w
^
=
(
X
T
X
)
−
1
X
T
y
\hat w = (X^TX)^{-1}X^Ty
w^=(XTX)−1XTy
这样我们就得到了拟合的
w
^
\hat w
w^,至此最小二乘法的推导结束!
3 代码验证
假如有这么一组数据:
x x x | y y y |
---|---|
1 | 2 |
3 | 4 |
我们要利用最小二乘法得到它的一次线性拟合函数,过程如下:
我们可以知道:
X
=
[
1
1
3
1
]
y
=
[
2
4
]
X = \left [\begin{array}{cccc} 1 &1 \\ 3 &1 \\ \end{array}\right]\\ y = \left [\begin{array}{cccc} 2 \\ 4 \\ \end{array}\right] \\
X=[1311]y=[24]
需要拟合的参数为:
w
^
=
[
w
,
b
]
T
\hat w = [w,b]^T
w^=[w,b]T
则:
即拟合出来的函数表达式为:
y
=
x
+
1
y=x+1
y=x+1
Python代码实现:文章来源:https://www.toymoban.com/news/detail-418953.html
import numpy as np # 导入numpy库用于相关计算
X = np.array([[1, 1], [3, 1]]) # 矩阵X
y = np.array([2, 4]).reshape(2, 1) # 观察值
result=np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y) # 相关矩阵运算
# 得到的结果中,最后一个值为b,其余从上到下分别为x1的系数,x2的系数......
print("拟合的参数为:",result)
因为CSDN的Markdown编辑器无法正常编译一些公式,所以用了图片,原md文件的网址:https://gitee.com/image111111/image1/raw/master/%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98%E6%B3%95%E7%9A%84%E7%9F%A9%E9%98%B5%E8%A1%A8%E8%BE%BE.md文章来源地址https://www.toymoban.com/news/detail-418953.html
到了这里,关于最小二乘法的矩阵表达的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!