1. 正向传播过程
1.1 卷积层-卷积运算
我们假设卷积运算如下(其中couv代表卷积运算,w是卷集核的数据,卷积核为2*2,b为偏置数)。建设上一层输出的特征图是
3
∗
3
3*3
3∗3,经过卷积运算以及加上偏置结果如下:
[
a
11
l
−
1
a
12
l
−
1
a
13
l
−
1
a
21
l
−
1
a
22
l
−
1
a
23
l
−
1
a
31
l
−
1
a
32
l
−
1
a
33
l
−
1
]
c
o
u
v
[
w
11
l
w
12
l
w
21
l
w
22
l
]
+
[
b
11
l
b
12
l
b
21
l
b
22
l
]
=
[
z
11
l
z
12
l
z
21
l
z
22
l
]
(1)
\begin{bmatrix} a_{11}^{l-1} & a_{12}^{l-1} & a_{13}^{l-1} \\ a_{21}^{l-1} & a_{22}^{l-1} & a_{23}^{l-1} \\ a_{31}^{l-1} & a_{32}^{l-1} & a_{33}^{l-1} \\ \end{bmatrix} couv \begin{bmatrix} w_{11}^{l}& w_{12}^{l}\\ w_{21}^{l}& w_{22}^{l}\\ \end{bmatrix} + \begin{bmatrix} b_{11}^{l}& b_{12}^{l}\\ b_{21}^{l}& b_{22}^{l}\\ \end{bmatrix} =\begin{bmatrix} z_{11}^{l}& z_{12}^{l}\\ z_{21}^{l}& z_{22}^{l}\\ \end{bmatrix} \tag{1}
a11l−1a21l−1a31l−1a12l−1a22l−1a32l−1a13l−1a23l−1a33l−1
couv[w11lw21lw12lw22l]+[b11lb21lb12lb22l]=[z11lz21lz12lz22l](1)
其中
y
^
\hat y
y^代表预测值(对输出的值经过激活函数的结果):
y
^
=
σ
(
z
l
)
(2)
\hat y = \sigma (z^{l}) \tag{2}
y^=σ(zl)(2)
1.2 池化层-向下采样
池化有平均池化和最大池化,这里以平均池化为例子。即将原始矩阵按照指定的大小比例进行缩放。将原始矩阵缩小到一个更小的尺寸,通过将相邻元素的值进行平均来得到新的缩放后的矩阵
m
a
t
r
i
x
=
[
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
]
matrix = \left[\begin {array}{c} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \\ \end{array}\right]
matrix=
15913261014371115481216
设置池化层大小为
2
∗
2
2*2
2∗2,则
4
∗
4
4*4
4∗4的矩阵经过池化层后输出的矩阵大小为
2
∗
2
2*2
2∗2
- 对于第一行第一列的元素:计算原始矩阵中小区域 {(0, 0), (0, 1), (1, 0), (1, 1)} 内元素的平均值:(1 + 2 + 5 + 6) / 4 = 3.5,将其赋值给 scaledMatrix[0][0]。
- 最终的矩阵
[ 3.5 5.5 11.5 13.5 ] \left[\begin {array}{c} 3.5 & 5.5 \\ 11.5 & 13.5\\ \end{array}\right] [3.511.55.513.5]
2. 输出层误差项
输出层的误差项通过损失函数相对于输出的梯度来计算
2.1 损失函数
- 均方误差(MSE),适用于回归问题
M S E = 1 n ∑ i = 1 n ( y i ^ − y i ) 2 MSE=\frac{1}{n} \sum_{i=1}^{n}(\hat{y_{i}} - y_{i})^{2} MSE=n1i=1∑n(yi^−yi)2
y i y_i yi是真实的值; y i ^ \hat{y_{i}} yi^是预测值 - 交叉熵损失 适用于分类问题
C r o s s − e n t r o p y l o s s = − ∑ i = 1 n y i log ( y i ^ ) Cross-entropy loss = -\sum_{i=1}^{n}y_{i}\log(\hat{y_{i}}) Cross−entropyloss=−i=1∑nyilog(yi^)
2.2 误差项推导过程
为了方便计算,我们选择的损失函数为MSE,n去2;
y
i
y_i
yi是真实的值;
y
i
^
\hat{y_{i}}
yi^是预测值,则损失函数
J
J
J则表示为:
J
=
1
2
(
y
i
^
−
y
i
)
2
(3)
J = \frac{1}{2}(\hat{y_{i}} - y_{i})^{2} \tag{3}
J=21(yi^−yi)2(3)
我们由(2)知道
y
i
^
\hat{y_{i}}
yi^的表达式,所以 计算损失函数对于输出层的加权输入
z
l
z^{l}
zl的偏导数(这里采用了链式法则)
∂
J
∂
z
l
=
∂
J
∂
y
^
⋅
∂
y
^
∂
z
l
(4)
\frac{\partial J}{\partial z^l}=\frac{\partial J}{\partial \hat y}\cdot \frac{\partial \hat y}{\partial z^l} \tag{4}
∂zl∂J=∂y^∂J⋅∂zl∂y^(4)
而在这个公式中
∂
J
∂
y
^
\frac{\partial J}{\partial \hat y}
∂y^∂J可以计算出,我们
J
J
J是用的均方误差函数
∂
J
∂
y
^
=
y
^
−
y
(5)
\frac{\partial J}{\partial \hat y}=\hat{y} - y \tag{5}
∂y^∂J=y^−y(5)
所以输出层的误差项通过损失函数相对于输出的梯度
δ
l
=
∂
J
∂
z
l
=
(
y
^
−
y
)
⋅
σ
′
(
z
l
)
(6)
\delta^{l} =\frac{\partial J}{\partial z^l}=(\hat{y} - y)\cdot \sigma ' (z^{l}) \tag{6}
δl=∂zl∂J=(y^−y)⋅σ′(zl)(6)
假设这里用的激活函数是sigmod函数。
∂
y
^
∂
z
l
=
σ
′
(
z
l
)
=
σ
(
z
l
)
⋅
(
1
−
σ
(
z
l
)
)
(7)
\frac{\partial \hat y}{\partial z^l}=\sigma ' (z^{l})=\sigma (z^{l})\cdot (1-\sigma (z^{l}))\tag{7}
∂zl∂y^=σ′(zl)=σ(zl)⋅(1−σ(zl))(7)
∂ J ∂ z l = ( y ^ − y ) ⋅ σ ′ ( z l ) = ( y ^ − y ) ⋅ σ ( z l ) ⋅ ( 1 − σ ( z l ) ) (8) \frac{\partial J}{\partial z^l}=(\hat{y} - y)\cdot\sigma ' (z^{l})=(\hat{y} - y)\cdot\sigma (z^{l})\cdot (1-\sigma (z^{l}))\tag{8} ∂zl∂J=(y^−y)⋅σ′(zl)=(y^−y)⋅σ(zl)⋅(1−σ(zl))(8)
3. 已知卷积层的误差,推上一层(反卷积)
3.1 池化层的误差项
假设我们的卷积层为 δ l \delta^{l} δl,推上一层池化层 δ l − 1 \delta^{l-1} δl−1,我们要结合卷积层误差项 δ l \delta^l δl去推上一层的误差项。
3.1.1 推导过程
在卷积层中,我们卷积计算后还需要进行激活函数处理。例如公式(2)表达式,我们进一步细化这个公式:
y
^
=
a
l
=
σ
(
z
L
)
=
σ
(
a
l
−
1
∗
W
l
+
b
l
)
(9)
\hat y=a^{l} =\sigma(z^L)=\sigma(a^{l-1}*W^l + b^l) \tag{9}
y^=al=σ(zL)=σ(al−1∗Wl+bl)(9)
∂
y
^
∂
z
l
=
∂
a
l
∂
z
l
=
σ
′
(
z
L
)
(10)
\frac{\partial \hat y}{\partial z^l}=\frac{\partial a^l}{\partial z^l}=\sigma ' (z^L) \tag{10}
∂zl∂y^=∂zl∂al=σ′(zL)(10)
那
δ
l
−
1
\delta^{l-1}
δl−1的误差项:
δ
l
−
1
=
∂
J
∂
z
l
−
1
(链式法则去化解)
=
∂
J
∂
z
l
⋅
∂
z
l
∂
z
l
−
1
=
δ
l
⋅
∂
z
l
∂
a
l
−
1
⋅
∂
a
l
−
1
∂
z
l
−
1
=
δ
l
⋅
∂
z
l
∂
a
l
−
1
⋅
σ
′
(
z
l
−
1
)
(11)
\begin{equation} \begin{split} \delta^{l-1}& =\frac{\partial J}{\partial z^{l-1}} \text{(链式法则去化解)} \\ & =\frac{\partial J}{\partial z^{l}} \cdot \frac{\partial z^{l}}{\partial z^{l-1}}\\ & =\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}\cdot \frac{\partial a^{l-1}}{\partial z^{l-1}}\\ &=\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}\cdot \sigma ' (z^{l-1}) \end{split} \end{equation} \tag{11}
δl−1=∂zl−1∂J(链式法则去化解)=∂zl∂J⋅∂zl−1∂zl=δl⋅∂al−1∂zl⋅∂zl−1∂al−1=δl⋅∂al−1∂zl⋅σ′(zl−1)(11)
这是我们来单独看这里面的一些符号:
-
∂
z
l
∂
a
l
−
1
\frac{\partial z^{l}}{\partial a^{l-1}}
∂al−1∂zl
我们知道如下公式不难得出:(可以参考卷积层卷积运算公式)
z l = w l ⋅ a l − 1 + b l (12) z^{l}=w^l\cdot a^{l-1} + b^l \tag{12} zl=wl⋅al−1+bl(12)
那对公式12求导则
∂ z l ∂ a l − 1 = w l (13) \frac{\partial z^{l}}{\partial a^{l-1}} = w^l \tag{13} ∂al−1∂zl=wl(13) -
δ
l
⋅
∂
z
l
∂
a
l
−
1
\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}
δl⋅∂al−1∂zl他们之间有啥关联吗?
∇ a = δ l ⋅ ∂ z l ∂ a l − 1 (链式法则) = δ l ⋅ w l = ∂ J ∂ z l ⋅ ∂ z l ∂ a l − 1 = ∂ J ∂ a l − 1 (14) \begin{equation} \begin{split} \nabla a & = \delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}} \text{(链式法则)} \\ & = \delta^{l}\cdot w^{l}\\ & =\frac{\partial J}{\partial z^{l}}\cdot\frac{\partial z^{l}}{\partial a^{l-1}}\\ &=\frac{\partial J}{\partial a^{l-1}} \end{split} \end{equation} \tag{14} ∇a=δl⋅∂al−1∂zl(链式法则)=δl⋅wl=∂zl∂J⋅∂al−1∂zl=∂al−1∂J(14)
从这个公式知道 ∇ a \nabla a ∇a代表损失函数 J J J关于 a l − 1 a^{l-1} al−1的导数,即我们每个矩阵值的误差项。我们根据损失函数的变化情况来更新网络的参数,从而优化网络的性能。梯度下降算法。 - 结合
∇
a
\nabla a
∇a来尝试计算卷积层的误差项,会有什么规律。
我们根据文章上面卷积层-卷积运算的例子来细化每一个z的取值
z 11 = a 11 ⋅ w 11 + a 12 ⋅ w 12 + a 21 ⋅ w 21 + a 22 ⋅ w 22 + b 11 z 12 = a 12 ⋅ w 11 + a 13 ⋅ w 12 + a 22 ⋅ w 21 + a 23 ⋅ w 22 + b 12 z 21 = a 21 ⋅ w 11 + a 22 ⋅ w 12 + a 31 ⋅ w 21 + a 32 ⋅ w 22 + b 21 z 22 = a 22 ⋅ w 11 + a 23 ⋅ w 12 + a 32 ⋅ w 21 + a 33 ⋅ w 22 + b 22 (15) z_{11} = a_{11} \cdot w_{11} + a_{12} \cdot w_{12} + a_{21} \cdot w_{21} + a_{22} \cdot w_{22} + b_{11} \\ z_{12} = a_{12} \cdot w_{11} + a_{13} \cdot w_{12} + a_{22} \cdot w_{21} + a_{23} \cdot w_{22} + b_{12} \\ z_{21} = a_{21} \cdot w_{11} + a_{22} \cdot w_{12} + a_{31} \cdot w_{21} + a_{32} \cdot w_{22} + b_{21} \\ z_{22} = a_{22} \cdot w_{11} + a_{23} \cdot w_{12} + a_{32} \cdot w_{21} + a_{33} \cdot w_{22} + b_{22} \tag{15} z11=a11⋅w11+a12⋅w12+a21⋅w21+a22⋅w22+b11z12=a12⋅w11+a13⋅w12+a22⋅w21+a23⋅w22+b12z21=a21⋅w11+a22⋅w12+a31⋅w21+a32⋅w22+b21z22=a22⋅w11+a23⋅w12+a32⋅w21+a33⋅w22+b22(15)
根据公式(15)得出 ∇ a \nabla a ∇a他们的每个的具体误差项
∇ a 11 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 11 = δ 11 ⋅ w 11 ∇ a 12 = ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 12 + ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 12 = δ 12 ⋅ w 11 + δ 11 ⋅ w 12 ∇ a 13 = ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 13 = δ 12 ⋅ w 12 ∇ a 21 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 21 + ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 21 = δ 11 ⋅ w 21 + δ 21 ⋅ w 11 ∇ a 22 = ∂ J ∂ z 11 ⋅ ∂ z 11 ∂ a 22 + ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 22 + ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 22 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 22 = δ 11 ⋅ w 22 + δ 12 ⋅ w 21 + δ 21 ⋅ w 12 + δ 22 ⋅ w 11 ∇ a 23 = ∂ J ∂ z 12 ⋅ ∂ z 12 ∂ a 23 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 23 = δ 12 ⋅ w 22 + δ 22 ⋅ w 12 ∇ a 31 = ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 31 = δ 21 ⋅ w 21 ∇ a 32 = ∂ J ∂ z 21 ⋅ ∂ z 21 ∂ a 32 + ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 32 = δ 21 ⋅ w 22 + δ 22 ⋅ w 21 ∇ a 33 = ∂ J ∂ z 22 ⋅ ∂ z 22 ∂ a 33 = δ 22 ⋅ w 22 \begin{equation} \begin{split} & \nabla a_{11} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{11}}= \delta_{11}\cdot w_{11} \\ & \nabla a_{12} =\frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{12}} + \frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{12}} = \delta_{12}\cdot w_{11} + \delta_{11}\cdot w_{12}\\ & \nabla a_{13} =\frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{13}} =\delta_{12}\cdot w_{12}\\ & \nabla a_{21} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{21}} + \frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{21}} = \delta_{11}\cdot w_{21} + \delta_{21}\cdot w_{11}\\ & \nabla a_{22} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial a_{22}} + \frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{22}} + \frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{22}}+ \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{22}}= \delta_{11} \cdot w_{22} + \delta_{12}\cdot w_{21} + \delta_{21}\cdot w_{12} + \delta_{22}\cdot w_{11}\\ & \nabla a_{23} =\frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial a_{23}} + \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{23}} = \delta_{12}\cdot w_{22} + \delta_{22}\cdot w_{12}\\ & \nabla a_{31} =\frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{31}} = \delta_{21}\cdot w_{21} \\ &\nabla a_{32} =\frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial a_{32}} + \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{32}} = \delta_{21}\cdot w_{22} + \delta_{22}\cdot w_{21} \\ &\nabla a_{33} =\frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial a_{33}} = \delta_{22}\cdot w_{22} \\ \end{split} \end{equation} ∇a11=∂z11∂J⋅∂a11∂z11=δ11⋅w11∇a12=∂z12∂J⋅∂a12∂z12+∂z11∂J⋅∂a12∂z11=δ12⋅w11+δ11⋅w12∇a13=∂z12∂J⋅∂a13∂z12=δ12⋅w12∇a21=∂z11∂J⋅∂a21∂z11+∂z21∂J⋅∂a21∂z21=δ11⋅w21+δ21⋅w11∇a22=∂z11∂J⋅∂a22∂z11+∂z12∂J⋅∂a22∂z12+∂z21∂J⋅∂a22∂z21+∂z22∂J⋅∂a22∂z22=δ11⋅w22+δ12⋅w21+δ21⋅w12+δ22⋅w11∇a23=∂z12∂J⋅∂a23∂z12+∂z22∂J⋅∂a23∂z22=δ12⋅w22+δ22⋅w12∇a31=∂z21∂J⋅∂a31∂z21=δ21⋅w21∇a32=∂z21∂J⋅∂a32∂z21+∂z22∂J⋅∂a32∂z22=δ21⋅w22+δ22⋅w21∇a33=∂z22∂J⋅∂a33∂z22=δ22⋅w22
把这个转换为卷积运算:
[ ∇ a 11 ∇ a 12 ∇ a 13 ∇ a 21 ∇ a 22 ∇ a 23 ∇ a 31 ∇ a 32 ∇ a 33 ] = [ δ 11 ⋅ w 11 δ 12 ⋅ w 11 + δ 11 ⋅ w 12 δ 12 ⋅ w 12 δ 11 ⋅ w 21 + δ 21 ⋅ w 11 δ 11 ⋅ w 22 + δ 12 ⋅ w 21 + δ 21 ⋅ w 12 + δ 22 ⋅ w 11 δ 12 ⋅ w 22 + δ 22 ⋅ w 12 δ 21 ⋅ w 21 δ 21 ⋅ w 22 + δ 22 ⋅ w 21 δ 22 ⋅ w 22 ] \begin{bmatrix} \nabla a_{11} & \nabla a_{12} & \nabla a_{13} \\ \nabla a_{21} & \nabla a_{22} & \nabla a_{23} \\ \nabla a_{31} & \nabla a_{32} & \nabla a_{33} \\ \end{bmatrix}=\begin{bmatrix} \delta_{11}\cdot w_{11} & \delta_{12}\cdot w_{11} + \delta_{11}\cdot w_{12} & \delta_{12}\cdot w_{12} \\ \delta_{11}\cdot w_{21} + \delta_{21}\cdot w_{11} & \delta_{11} \cdot w_{22} + \delta_{12}\cdot w_{21} + \delta_{21}\cdot w_{12} + \delta_{22}\cdot w_{11} & \delta_{12}\cdot w_{22} + \delta_{22}\cdot w_{12} \\ \delta_{21}\cdot w_{21} & \delta_{21}\cdot w_{22} + \delta_{22}\cdot w_{21} & \delta_{22}\cdot w_{22} \\ \end{bmatrix} ∇a11∇a21∇a31∇a12∇a22∇a32∇a13∇a23∇a33 = δ11⋅w11δ11⋅w21+δ21⋅w11δ21⋅w21δ12⋅w11+δ11⋅w12δ11⋅w22+δ12⋅w21+δ21⋅w12+δ22⋅w11δ21⋅w22+δ22⋅w21δ12⋅w12δ12⋅w22+δ22⋅w12δ22⋅w22
[ 0 0 0 0 0 δ 11 δ 12 0 0 δ 21 δ 22 0 0 0 0 0 ] c o n v [ w 22 w 21 w 12 w 11 ] = [ δ 11 ⋅ w 11 δ 12 ⋅ w 11 + δ 11 ⋅ w 12 δ 12 ⋅ w 12 δ 11 ⋅ w 21 + δ 21 ⋅ w 11 δ 11 ⋅ w 22 + δ 12 ⋅ w 21 + δ 21 ⋅ w 12 + δ 22 ⋅ w 11 δ 12 ⋅ w 22 + δ 22 ⋅ w 12 δ 21 ⋅ w 21 δ 21 ⋅ w 22 + δ 22 ⋅ w 21 δ 22 ⋅ w 22 ] \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & \delta_{11} & \delta_{12} & 0 \\ 0 & \delta_{21} & \delta_{22}& 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} conv \begin{bmatrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{bmatrix}= \begin{bmatrix} \delta_{11}\cdot w_{11} & \delta_{12}\cdot w_{11} + \delta_{11}\cdot w_{12} & \delta_{12}\cdot w_{12} \\ \delta_{11}\cdot w_{21} + \delta_{21}\cdot w_{11} & \delta_{11} \cdot w_{22} + \delta_{12}\cdot w_{21} + \delta_{21}\cdot w_{12} + \delta_{22}\cdot w_{11} & \delta_{12}\cdot w_{22} + \delta_{22}\cdot w_{12} \\ \delta_{21}\cdot w_{21} & \delta_{21}\cdot w_{22} + \delta_{22}\cdot w_{21} & \delta_{22}\cdot w_{22} \\ \end{bmatrix} 00000δ11δ2100δ12δ2200000 conv[w22w12w21w11]= δ11⋅w11δ11⋅w21+δ21⋅w11δ21⋅w21δ12⋅w11+δ11⋅w12δ11⋅w22+δ12⋅w21+δ21⋅w12+δ22⋅w11δ21⋅w22+δ22⋅w21δ12⋅w12δ12⋅w22+δ22⋅w12δ22⋅w22
3.1.2 误差项表示
即卷积层的误差项是上一层池化层的误差项与卷积核大小旋转180度的卷积运算。即进一步蒋公式11化解:(其中池化层没有激活函数,或者可以理解
δ
(
x
)
=
x
\delta (x) = x
δ(x)=x 求导就为1)
δ
l
−
1
=
δ
l
⋅
∂
z
l
∂
a
l
−
1
⋅
σ
′
(
z
l
−
1
)
=
δ
l
c
o
n
v
(
r
o
t
189
(
w
l
)
)
⋅
σ
′
(
z
l
−
1
)
(16)
\delta^{l-1} =\delta^{l}\cdot \frac{\partial z^{l}}{\partial a^{l-1}}\cdot \sigma ' (z^{l-1})= \delta^{l} conv ( rot189(w^l))\cdot \sigma ' (z^{l-1}) \tag{16}
δl−1=δl⋅∂al−1∂zl⋅σ′(zl−1)=δlconv(rot189(wl))⋅σ′(zl−1)(16)
3.2 推导W和b的梯度
3.2.1 推导过程
我们知道这是对矩阵特征值的误差项,
δ
l
−
1
=
∂
J
∂
z
l
−
1
\delta^{l-1} =\frac{\partial J}{\partial z^{l-1}}
δl−1=∂zl−1∂J
同理对
W
W
W的梯度为:
∂
J
∂
W
l
=
∂
J
∂
z
l
⋅
∂
z
l
∂
W
l
=
δ
l
∂
z
l
∂
W
l
(17)
\frac{\partial J}{\partial W^{l}} = \frac{\partial J}{\partial z^{l}} \cdot \frac{\partial z^{l}}{\partial W^{l}} = \delta^{l}\frac{\partial z^{l}}{\partial W^{l}} \tag{17}
∂Wl∂J=∂zl∂J⋅∂Wl∂zl=δl∂Wl∂zl(17)
同理计算机
∇
a
\nabla a
∇a,我们也可以计算出W的梯度:
∇
W
\nabla W
∇W 如:
∇
W
11
=
∂
J
∂
z
11
⋅
∂
z
11
∂
W
11
+
∂
J
∂
z
12
⋅
∂
z
12
∂
W
11
+
∂
J
∂
z
21
⋅
∂
z
21
∂
W
11
+
∂
J
∂
z
22
⋅
∂
z
22
∂
W
11
=
δ
11
a
11
+
δ
12
a
12
+
δ
21
a
21
+
δ
22
a
22
\nabla W _{11} =\frac{\partial J}{\partial z_{11}} \cdot \frac{\partial z_{11}}{\partial W_{11}} + \frac{\partial J}{\partial z_{12}} \cdot \frac{\partial z_{12}}{\partial W_{11}} + \frac{\partial J}{\partial z_{21}} \cdot \frac{\partial z_{21}}{\partial W_{11}} + \frac{\partial J}{\partial z_{22}} \cdot \frac{\partial z_{22}}{\partial W_{11}}= \delta _{11} a_{11} + \delta _{12} a_{12} + \delta _{21} a_{21} +\delta _{22} a_{22}
∇W11=∂z11∂J⋅∂W11∂z11+∂z12∂J⋅∂W11∂z12+∂z21∂J⋅∂W11∂z21+∂z22∂J⋅∂W11∂z22=δ11a11+δ12a12+δ21a21+δ22a22
同理
∇
W
12
,
∇
W
21
,
∇
W
22
\nabla W _{12}, \nabla W _{21},\nabla W _{22}
∇W12,∇W21,∇W22
[
∇
W
11
∇
W
12
∇
W
21
∇
W
22
]
=
[
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
]
c
o
n
v
[
δ
11
δ
12
δ
21
δ
22
]
(18)
\begin{bmatrix} \nabla W _{11} & \nabla W _{12} \\ \nabla W _{21} & \nabla W _{22} \\ \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{bmatrix} conv \begin{bmatrix} \delta _{11} & \delta _{12} \\ \delta _{21} & \delta _{22} \\ \end{bmatrix} \tag{18}
[∇W11∇W21∇W12∇W22]=
a11a21a31a12a22a32a13a23a33
conv[δ11δ21δ12δ22](18)
3.2.2 误差项表示
故权重的误差项为:
∂
J
∂
W
l
=
a
l
−
1
c
o
n
v
(
δ
l
)
(19)
\frac{\partial J}{\partial W^{l}} = a^{l-1} conv (\delta^{l}) \tag{19}
∂Wl∂J=al−1conv(δl)(19)
3.2.3 偏执项b的误差
∂ J ∂ b l = ∑ u v ( δ l ) u v (20) \frac{\partial J}{\partial b^{l}} = \sum _{uv}(\delta^{l})_{uv} \tag{20} ∂bl∂J=uv∑(δl)uv(20)
4. 已知卷积层误差,推上一层误差(反池化)
在cnn中,池化层主要是缩放矩阵,在正向传播中,主要进行向下采样,在反向传播中,我们是倒着回去,应该向上采样来填充误差项。
在池化层,没有经过激活函数的,池化层主要有两种方法:最大池化层和平均池化层。假设我们把池化层误差项标记为:
δ
l
\delta^l
δl
4.1 平均池化层的误差项
假设池化层是将88的矩阵进行缩放,输出的特征图是44:
平均池化:
[
1
2
8
4
]
反向传播
→
[
0.25
0.25
0.5
0.5
0.25
0.25
0.5
0.5
2
2
1
1
2
2
1
1
]
平均池化:\begin{bmatrix} 1 & 2 \\ 8 & 4 \\ \end{bmatrix}\underrightarrow{反向传播} \begin{bmatrix} 0.25 & 0.25 & 0.5 & 0.5\\ 0.25 & 0.25 & 0.5 & 0.5 \\ 2 & 2 & 1 & 1 \\ 2 & 2 & 1 & 1 \\ \end{bmatrix}
平均池化:[1824]反向传播
0.250.25220.250.25220.50.5110.50.511
文章来源:https://www.toymoban.com/news/detail-640257.html
4.2 最大池化层的误差项
最大池化在进行反向传播的时候,就需要把最大值放在之前做前向传播算法得到最大值的位置。(这里在进行卷积运算就要记录最大值的原始位置。)
最大池化:
[
1
2
8
4
]
反向传播
→
[
1
0
0
0
0
0
2
0
0
8
0
0
0
0
4
0
]
最大池化:\begin{bmatrix} 1 & 2 \\ 8 & 4 \\ \end{bmatrix}\underrightarrow{反向传播} \begin{bmatrix} 1& 0 & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 8 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ \end{bmatrix}
最大池化:[1824]反向传播
1000008002040000
文章来源地址https://www.toymoban.com/news/detail-640257.html
到了这里,关于CNN卷积神经网络之反向传播过程的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!