【深度学习】动手学深度学习(PyTorch版)李沐 2.4.3 梯度【公式推导】

这篇具有很好参考价值的文章主要介绍了【深度学习】动手学深度学习(PyTorch版)李沐 2.4.3 梯度【公式推导】。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

2.4.3. 梯度

  我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数 f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:RnR的输入是一个 n n n维向量 x ⃗ = [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix} x = x1x2⋅⋅⋅xn ,输出是一个标量。 函数 f ( x ⃗ ) f(\vec x) f(x )相对于 x ⃗ \vec x x 的梯度是一个包含 n n n个偏导数的向量:
∇ x ⃗ f ( x ⃗ ) = [ ∂ f ( x ⃗ ) ∂ x 1 ∂ f ( x ⃗ ) ∂ x 2 ⋅ ⋅ ⋅ ∂ f ( x ⃗ ) ∂ x n ] \nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix} x f(x )= x1f(x )x2f(x )⋅⋅⋅xnf(x )
其中 ∇ x ⃗ f ( x ⃗ ) \nabla_{\vec x} f(\vec x) x f(x )通常在没有歧义时被 ∇ f ( x ⃗ ) \nabla f(\vec x) f(x )取代。


假设 x ⃗ \vec x x n n n维向量,在微分多元函数时经常使用以下规则:

一、对于所有 A ∈ R m × n A \in \mathbb{R^{m\times n}} ARm×n,都有 ∇ x ⃗ A x ⃗ = A ⊤ \nabla_{\vec x} A\vec x = A^\top x Ax =A

证明:设 A ( m , n ) A_{(m,n)} A(m,n) = [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a m , 1 a m , 2 ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix} a1,1a2,1⋅⋅⋅am,1a1,2a2,2⋅⋅⋅am,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅am,n
A x ⃗ ( m , 1 ) A\vec x_{(m,1)} Ax (m,1) = [ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ⋅ ⋅ ⋅ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix} a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn ,
∇ x ⃗ A x ⃗ \nabla_{\vec x}A\vec x x Ax = [ ∂ A x ⃗ ∂ x 1 ∂ A x ⃗ ∂ x 2 ⋅ ⋅ ⋅ ∂ A x ⃗ ∂ x n ] \begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix} x1Ax x2Ax ⋅⋅⋅xnAx
= [ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 1 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 1 ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 2 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x n ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x n ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix} x1a1,1x1+a1,2x2+⋅⋅⋅+a1,nxnx2a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn⋅⋅⋅xna1,1x1+a1,2x2+⋅⋅⋅+a1,nxnx1a2,1x1+a2,2x2+⋅⋅⋅+a2,nxnx2a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅xna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1am,1x1+am,2x2+⋅⋅⋅+am,nxnx2am,1x1+am,2x2+⋅⋅⋅+am,nxn⋅⋅⋅xnam,1x1+am,2x2+⋅⋅⋅+am,nxn
= [ a 1 , 1 a 2 , 1 ⋅ ⋅ ⋅ a m , 1 a 1 , 2 a 2 , 2 ⋅ ⋅ ⋅ a m , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a 1 , n a 2 , n ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix} a1,1a1,2⋅⋅⋅a1,na2,1a2,2⋅⋅⋅a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅am,1am,2⋅⋅⋅am,n = A ⊤ A^\top A

二、对于所有 A ∈ R n × m A \in \mathbb{R^{n\times m}} ARn×m,都有 ∇ x ⃗ x ⃗ ⊤ A = A \nabla_{\vec x} \vec x^\top A = A x x A=A

证明:设 A ( n , m ) A_{(n,m)} A(n,m)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m
x ⃗ ⊤ A \vec x^\top A x A=
[ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \end{bmatrix} [a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,mx1+a2,mx2+⋅⋅⋅+an,mxn],
∇ x ⃗ x ⃗ ⊤ A \nabla_{\vec x}\vec x^\top A x x A= [ ∂ x ⃗ ⊤ A ∂ x 1 ∂ x ⃗ ⊤ A ∂ x 2 ⋅ ⋅ ⋅ ∂ x ⃗ ⊤ A ∂ x n ] \begin{bmatrix}\frac{\partial \vec x^\top A}{\partial x_1}\\\frac{\partial \vec x^\top A}{\partial x_2}\\···\\ \frac{\partial \vec x^\top A}{\partial x_n}\end{bmatrix} x1x Ax2x A⋅⋅⋅xnx A
= [ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 1 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 1 ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 2 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x n ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x n ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_1}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_1}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_2}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_2}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_n}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_n}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_n}\end{bmatrix} x1a1,1x1+a2,1x2+⋅⋅⋅+an,1xnx2a1,1x1+a2,1x2+⋅⋅⋅+an,1xn⋅⋅⋅xna1,1x1+a2,1x2+⋅⋅⋅+an,1xnx1a1,2x1+a2,2x2+⋅⋅⋅+an,2xnx2a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1a1,mx1+a2,mx2+⋅⋅⋅+an,mxnx2a1,mx1+a2,mx2+⋅⋅⋅+an,mxn⋅⋅⋅xna1,mx1+a2,mx2+⋅⋅⋅+an,mxn
= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1} & a_{1,2}&···&a_{1,m}\\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ···&···&···&···\\ a_{n,1}&a_{n,2}&···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m = A A A

三、对于所有 A ∈ R n × n A \in \mathbb{R^{n\times n}} ARn×n,都有 ∇ x ⃗ x ⃗ ⊤ A x ⃗ = ( A + A ⊤ ) x ⃗ \nabla_{\vec x} \vec x^\top A \vec x = (A+A^\top)\vec x x x Ax =(A+A)x

证明:设 A ( n , n ) A_{(n,n)} A(n,n)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,n} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅an,n
x ⃗ ⊤ A \vec x^\top A x A= [ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , n x 1 + a 2 , n x 2 + ⋅ ⋅ ⋅ + a n , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \end{bmatrix} [a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,nx1+a2,nx2+⋅⋅⋅+an,nxn],
x ⃗ ⊤ A x ⃗ \vec x^\top A \vec x x Ax = [ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ] \begin{bmatrix} \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j) \end{bmatrix} [i=1nj=1n(ai,jxixj)],
∇ x ⃗ x ⃗ ⊤ A x ⃗ \nabla_{\vec x}\vec x^\top A \vec x x x Ax = [ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 1 ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 2 ⋅ ⋅ ⋅ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x n ] \begin{bmatrix} \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_1} \\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_2} \\ ···\\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_n} \end{bmatrix} x1i=1nj=1n(ai,jxixj)x2i=1nj=1n(ai,jxixj)⋅⋅⋅xni=1nj=1n(ai,jxixj) = [ ∑ i = 1 n ( a i , 1 + a 1 , i ) x i ∑ i = 1 n ( a i , 2 + a 2 , i ) x i ⋅ ⋅ ⋅ ∑ i = 1 n ( a i , n + a n , i ) x i ] \begin{bmatrix} \sum\limits_{i=1}^{n}(a_{i,1}+a_{1,i})x_i \\ \sum\limits_{i=1}^{n}(a_{i,2}+a_{2,i})x_i \\ ···\\ \sum\limits_{i=1}^{n}(a_{i,n}+a_{n,i})x_i \\ \end{bmatrix} i=1n(ai,1+a1,i)xii=1n(ai,2+a2,i)xi⋅⋅⋅i=1n(ai,n+an,i)xi
= [ 2 a 1 , 1 a 1 , 2 + a 2 , 1 ⋅ ⋅ ⋅ a 1 , n + a n , 1 a 2 , 1 + a 1 , 2 2 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n + a n , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 + a 1 , n a n , 2 + a 2 , n ⋅ ⋅ ⋅ 2 a n , n ] [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \begin{bmatrix} 2a_{1,1} & a_{1,2}+a_{2,1} & ···&a_{1,n}+a_{n,1} \\ a_{2,1}+a_{1,2} & 2a_{2,2} & ···&a_{2,n}+a_{n,2} \\ ···&···&···&···\\ a_{n,1}+a_{1,n} & a_{n,2}+a_{2,n} & ···&2a_{n,n} \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ ···\\ x_n \end{bmatrix} 2a1,1a2,1+a1,2⋅⋅⋅an,1+a1,na1,2+a2,12a2,2⋅⋅⋅an,2+a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,n+an,1a2,n+an,2⋅⋅⋅2an,n x1x2⋅⋅⋅xn = ( A + A ⊤ ) x ⃗ (A+A^\top)\vec x (A+A)x

四、 ∇ x ⃗ ∥ x ∥ 2 = ∇ x ⃗ x ⃗ ⊤ x ⃗ = 2 x ⃗ \nabla_{\vec x} \Vert x \Vert ^2=\nabla_{\vec x}\vec x^\top\vec x = 2\vec x x x2=x x x =2x

证明: ∇ x ⃗ ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 x x2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n x x12+x22+⋅⋅⋅+xnn= ∇ x ⃗ x ⊤ x \nabla_{\vec x}x^\top x x xx
∇ x ⃗ ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 x x2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n x x12+x22+⋅⋅⋅+xnn= [ 2 x 1 2 x 2 ⋅ ⋅ ⋅ 2 x n ] \begin{bmatrix} 2x_1\\ 2x_2\\ ···\\ 2x_n \end{bmatrix} 2x12x2⋅⋅⋅2xn = 2 x 2x 2x

  同样,对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X XXF2=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。

五、对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X XXF2=2X

证明:设 X X X m × n m\times n m×n的矩阵, X = [ x 1 , 1 x 1 , 2 ⋅ ⋅ ⋅ x 1 , n x 2 , 1 x 2 , 2 ⋅ ⋅ ⋅ x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ x m , 1 x m , 2 ⋅ ⋅ ⋅ x m , n ] X = \begin{bmatrix} x_{1,1}& x_{1,2}&···&x_{1,n}\\ x_{2,1}& x_{2,2}&···&x_{2,n}\\ ···&···&···&···\\ x_{m,1}& x_{m,2}&···&x_{m,n}\\ \end{bmatrix} X= x1,1x2,1⋅⋅⋅xm,1x1,2x2,2⋅⋅⋅xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1,nx2,n⋅⋅⋅xm,n
∥ X ∥ F 2 \Vert X \Vert_F^2 XF2= ∑ i = 1 m ∑ j = 1 n x i , j 2 2 \sqrt{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2}^2 i=1mj=1nxi,j2 2= ∑ i = 1 m ∑ j = 1 n x i , j 2 \sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2 i=1mj=1nxi,j2
∇ X ∥ X ∥ F 2 \nabla_X \Vert X \Vert_F^2 XXF2= [ 2 x 1 , 1 2 x 1 , 2 ⋅ ⋅ ⋅ 2 x 1 , n 2 x 2 , 1 2 x 2 , 2 ⋅ ⋅ ⋅ 2 x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 2 x m , 1 2 x m , 2 ⋅ ⋅ ⋅ 2 x m , n ] \begin{bmatrix} 2x_{1,1}& 2x_{1,2}&···&2x_{1,n}\\ 2x_{2,1}& 2x_{2,2}&···&2x_{2,n}\\ ···&···&···&···\\ 2x_{m,1}& 2x_{m,2}&···&2x_{m,n}\\ \end{bmatrix} 2x1,12x2,1⋅⋅⋅2xm,12x1,22x2,2⋅⋅⋅2xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅2x1,n2x2,n⋅⋅⋅2xm,n = 2 X 2X 2X

初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论文章来源地址https://www.toymoban.com/news/detail-795817.html

到了这里,关于【深度学习】动手学深度学习(PyTorch版)李沐 2.4.3 梯度【公式推导】的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 跟着李沐学AI(动手学深度学习 PyTorch版)学习笔记——03安装(环境配置d2l、pytorch)(python3.7版本+Windows+各种问题解决措施)

    1.下载Miniconda下载地址 2.在安装过程中需要勾选“Add Anaconda to the system PATH environment variable”选项 3.检验win+R,输入cmd,在文本框输入conda --version 1.点击该链接+点击jupyter记事本下载压缩包 2.解压该压缩包 3.在解压后的文件夹地址栏输入cmd回车进入命令模式。 1.conda和pip默认使⽤

    2024年02月12日
    浏览(45)
  • 李沐《动手学深度学习》深度学习计算

    李沐《动手学深度学习》预备知识 张量操作及数据处理 李沐《动手学深度学习》预备知识 线性代数及微积分 李沐《动手学深度学习》线性神经网络 线性回归 李沐《动手学深度学习》线性神经网络 softmax回归 李沐《动手学深度学习》多层感知机 模型概念和代码实现 李沐《

    2024年01月22日
    浏览(53)
  • 【李沐】动手学深度学习 学习笔记

    你好! 这是【李沐】动手学深度学习v2-基于pytorch版本的学习笔记 教材 源代码 安装教程(安装pytorch不要用pip,改成conda,pip太慢了,下载不下来) 个人推荐学习学习笔记 数据操作   本节代码文件在源代码文件的chapter_preliminaries/ndarray.ipynb中 创建数组   创建数组需要:

    2024年02月16日
    浏览(46)
  • 李沐《动手学深度学习》多层感知机 深度学习相关概念

    李沐《动手学深度学习》预备知识 张量操作及数据处理 李沐《动手学深度学习》预备知识 线性代数及微积分 李沐《动手学深度学习》线性神经网络 线性回归 李沐《动手学深度学习》线性神经网络 softmax回归 李沐《动手学深度学习》多层感知机 模型概念和代码实现 教材:

    2024年01月20日
    浏览(40)
  • 李沐-《动手学深度学习》--02-目标检测

    a . 算法步骤 使用启发式搜索算法来选择锚框(选出多个锚框大小可能不一,需要使用Rol pooling) 使用 预训练 好的模型(去掉分类层)对每个锚框进行特征抽取(如VGG,AlexNet…) 训练一个SVM来对每个类进行分类 训练一个线性回归模型来预测边缘框偏移 b . Rol Pooling ​ 每个锚框

    2024年01月25日
    浏览(35)
  • Tensor-动手学深度学习-李沐_笔记

    Tensor,又称\\\"张量\\\",其实就是n维度数组。不同维度的Tensor示意图如下:     reshape函数 可以处理总元素个数相同的任何新形状,【3,2,5】-【3,10】 -【5,6】这个流程如 下图所示:  有时需要对Tensor按照某一维度进行求和,那么实际上就是将所求和的维度 从向量降维成标量

    2024年02月11日
    浏览(27)
  • 人工智能_机器学习065_SVM支持向量机KKT条件_深度理解KKT条件下的损失函数求解过程_公式详细推导_---人工智能工作笔记0105

    之前我们已经说了KKT条件,其实就是用来解决 如何实现对,不等式条件下的,目标函数的求解问题,之前我们说的拉格朗日乘数法,是用来对 等式条件下的目标函数进行求解. KKT条件是这样做的,添加了一个阿尔法平方对吧,这个阿尔法平方肯定是大于0的,那么 可以结合下面的文章去

    2024年02月04日
    浏览(36)
  • 李沐《动手学深度学习》线性神经网络 线性回归

    李沐《动手学深度学习》预备知识 张量操作及数据处理 李沐《动手学深度学习》预备知识 线性代数及微积分 教材:李沐《动手学深度学习》 线性回归基于的 假设 : 假设自变量和因变量之间的关系是线性的,这里通常允许包含观测值的一些噪声; 假设任何噪声都比较正常

    2024年01月21日
    浏览(86)
  • 李沐《动手学深度学习》d2l——安装和使用

    今天想要跟着沐神学习一下循环神经网络,在跑代码的时候,d2l出现了问题,这里记录一下解决的过程,方便以后查阅。 下载whl :https://www.cnpython.com/pypi/d2l/dl-d2l-0.15.1-py3-none-any.whl 将下载的文件放到这里: 在这个文件中右键,选择“在终端中打开” 在终端中输入如下命令:

    2024年01月17日
    浏览(47)
  • 李沐 《动手学深度学习》预备知识 线性代数与微积分

    李沐《动手学深度学习》预备知识 张量操作与数据处理 教材:李沐《动手学深度学习》 标量(scalar) 仅包含一个数值被称为标量,标量由只有一个元素的张量表示。 向量 向量可以被视为标量值组成的列表,向量由一维张量表示。一般来说,张量可以具有任意长度,取决于

    2024年01月20日
    浏览(59)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包