非线性最小二乘
目录
1 非线性最小二乘估计
在经典最小二乘法估计中,假定被解释变量的条件期望是关于参数的线性函数,例如
E
(
y
∣
x
)
=
a
+
b
x
E(y|x) = a+bx
E(y∣x)=a+bx
其中
a
,
b
a,b
a,b为待估参数,
E
(
y
∣
x
)
E(y|x)
E(y∣x)是关于参数
a
,
b
a,b
a,b的线性函数。但
E
(
y
∣
x
)
E(y|x)
E(y∣x)是关于参数的非线性函数,则利用ols求出的正规方程组没有解析解。只能通过相关数值计算。考虑一个简单的非线性模型
Y
i
=
β
X
1
i
+
β
2
X
2
i
+
ε
i
Y_{i}=\beta X_{1 i}+\beta^{2} X_{2 i}+\varepsilon_{i}
Yi=βX1i+β2X2i+εi
其中扰动项
ε
i
\varepsilon_i
εi满足
E
(
ε
i
)
=
0
,
var
(
ε
i
)
=
σ
2
\mathrm{E}\left(\varepsilon_{i}\right)=0,\operatorname{var}\left(\varepsilon_{i}\right)=\sigma^{2}
E(εi)=0,var(εi)=σ2,且为独立同分布。其残差平方和为
S
(
β
)
=
∑
i
=
1
n
ε
i
2
=
∑
i
=
1
n
[
Y
i
−
f
(
X
i
,
β
)
]
2
=
∑
i
=
1
n
[
Y
i
−
β
X
1
i
−
β
2
X
2
i
]
2
\begin{aligned} S(\beta) &=\sum_{i=1}^{n} \varepsilon_{i}^{2}=\sum_{i=1}^{n}\left[Y_{i}-f\left(X_{i}, \beta\right)\right]^{2} \\ &=\sum_{i=1}^{n}\left[Y_{i}-\beta X_{1 i}-\beta^{2} X_{2 i}\right]^{2} \end{aligned}
S(β)=i=1∑nεi2=i=1∑n[Yi−f(Xi,β)]2=i=1∑n[Yi−βX1i−β2X2i]2
为了使回归线尽可能接近观测值,要求残差平方和最小。根据微积分的知识
d
S
d
β
=
2
∑
i
=
1
n
[
Y
i
−
f
(
X
i
,
β
)
]
(
−
d
f
(
X
i
,
β
)
d
β
)
=
2
∑
i
=
1
n
[
Y
i
−
β
X
1
i
−
β
2
X
2
i
]
[
−
X
1
i
−
2
β
X
2
i
]
=
0
\begin{aligned} \frac{\mathrm{d} S}{\mathrm{~d} \beta} &=2 \sum_{i=1}^{n}\left[Y_{i}-f\left(X_{i}, \beta\right)\right]\left(-\frac{\mathrm{d} f\left(X_{i}, \beta\right)}{\mathrm{d} \beta}\right) \\ &=2 \sum_{i=1}^{n}\left[Y_{i}-\beta X_{1 i}-\beta^{2} X_{2 i}\right]\left[-X_{1 i}-2 \beta X_{2 i}\right]=0 \end{aligned}
dβdS=2i=1∑n[Yi−f(Xi,β)](−dβdf(Xi,β))=2i=1∑n[Yi−βX1i−β2X2i][−X1i−2βX2i]=0
整理得:
2
β
3
∑
i
=
1
n
X
2
i
2
+
3
β
2
∑
i
=
1
n
X
1
i
X
2
i
+
β
(
∑
i
=
1
n
X
1
i
2
−
2
∑
i
=
1
n
X
2
i
Y
i
)
−
∑
i
=
1
n
X
1
i
Y
i
=
0
2 \beta^{3} \sum_{i=1}^{n} X_{2 i}^{2}+3 \beta^{2} \sum_{i=1}^{n} X_{1 i} X_{2 i}+\beta\left(\sum_{i=1}^{n} X_{1 i}^{2}-2 \sum_{i=1}^{n} X_{2 i} Y_{i}\right)-\sum_{i=1}^{n} X_{1 i} Y_{i}=0
2β3i=1∑nX2i2+3β2i=1∑nX1iX2i+β(i=1∑nX1i2−2i=1∑nX2iYi)−i=1∑nX1iYi=0
这是关于参数
β
\beta
β的三次函数。尽管三次函数存在解析解(利用卡丹或盛金公式),其结果极为复杂。若上述三次方程存在实根
β
i
(
i
=
1
,
2
,
3
)
\beta_i(i=1,2,3)
βi(i=1,2,3)(最多三个),则将
β
i
\beta_i
βi代入残差平方和,取
S
(
β
)
S(\beta)
S(β)最小所对应的
β
i
\beta_i
βi。上述例子中,被解释变量条件期望是关于参数的二次函数,如果将这种函数形式改为指数、对数或三角函数形式,则一般不存在解析解。
因此,数值分析自然成为解决上述问题的有力武器。考虑一般化的非线性回归问题,设总体回归模型满足
Y
=
f
(
X
,
β
)
+
ε
Y=f(X, \beta)+\varepsilon
Y=f(X,β)+ε
对应的残差平方和为
S
(
β
)
=
∑
i
=
1
n
[
Y
i
−
f
(
X
i
,
β
)
]
2
S(\beta)=\sum_{i=1}^{n}\left[Y_{i}-f\left(X_{i}, \beta\right)\right]^{2}
S(β)=i=1∑n[Yi−f(Xi,β)]2
要使其最小化,需要满足一阶条件
d
S
d
β
=
−
2
[
∑
i
=
1
n
[
Y
i
−
f
(
X
i
,
β
)
]
(
−
d
f
(
X
i
,
β
)
d
β
)
]
=
0
\frac{\mathrm{d} S}{\mathrm{~d} \beta}=-2\left[\sum_{i=1}^{n}\left[Y_{i}-f\left(X_{i}, \beta\right)\right]\left(-\frac{\mathrm{d} f\left(X_{i}, \boldsymbol{\beta}\right)}{\mathrm{d} \beta}\right)\right]=0
dβdS=−2[i=1∑n[Yi−f(Xi,β)](−dβdf(Xi,β))]=0
显然,上述问题不存在解析解,因此考虑对
f
(
X
i
,
β
)
f(X_i, \beta)
f(Xi,β)进行一阶泰勒展开。设参数向量
β
\beta
β的初始值为
β
1
\beta_1
β1,则可以在
β
1
\beta_1
β1附近找到函数
f
(
X
i
,
β
)
f(X_i, \beta)
f(Xi,β)使得
f
(
X
i
,
β
)
≈
f
(
X
i
,
β
1
)
+
d
f
(
X
i
,
β
)
d
β
∣
β
=
β
1
(
β
−
β
1
)
f\left(X_{i}, \beta\right) \approx f\left(X_{i}, \beta_{1}\right)+\frac{\mathrm{d} f\left(X_{i}, \beta\right)}{\mathrm{d} \beta} \mid_{\beta = \beta_{1}}\left(\beta-\beta_{1}\right)
f(Xi,β)≈f(Xi,β1)+dβdf(Xi,β)∣β=β1(β−β1)
记
d
f
(
X
i
,
β
)
d
β
∣
β
1
≈
f
(
X
i
,
β
)
−
f
(
X
,
β
)
β
−
β
1
\left.\frac{\mathrm{d} f\left(X_{i}, \beta\right)}{\mathrm{d} \beta}\right|_{\beta_{1}} \approx \frac{f\left(X_{i}, \beta\right)-f(X, \beta)}{\beta-\beta_{1}}
dβdf(Xi,β)
β1≈β−β1f(Xi,β)−f(X,β),简记
X
~
i
(
β
1
)
=
d
f
(
X
i
,
β
)
d
β
∣
β
1
\widetilde{X}_{i}\left(\beta_{1}\right)=\left.\frac{\mathrm{d} f\left(X_{i}, \beta\right)}{\mathrm{d} \beta}\right|_{\beta_{1}}
X
i(β1)=dβdf(Xi,β)
β1,则
S
(
β
)
=
∑
i
=
1
n
[
Y
i
−
f
(
X
i
,
β
1
)
−
X
~
i
(
β
1
)
(
β
−
β
1
)
]
2
=
∑
i
=
1
n
[
Y
~
i
(
β
1
)
−
X
i
(
β
1
)
β
]
2
\begin{aligned} S(\beta) &=\sum_{i=1}^{n}\left[Y_{i}-f\left(X_{i}, \beta_{1}\right)-\widetilde{X}_{i}\left(\beta_{1}\right)\left(\beta-\beta_{1}\right)\right]^{2} \\ &=\sum_{i=1}^{n}\left[\widetilde{Y}_{i}\left(\beta_{1}\right)-X_{i}\left(\beta_{1}\right) \beta\right]^{2} \end{aligned}
S(β)=i=1∑n[Yi−f(Xi,β1)−X
i(β1)(β−β1)]2=i=1∑n[Y
i(β1)−Xi(β1)β]2
其中
Y
~
i
(
β
1
)
=
Y
i
−
f
(
X
i
,
β
1
)
+
X
~
i
(
β
1
)
β
1
\widetilde{Y}_{i}\left(\beta_{1}\right)=Y_{i}-f\left(X_{i}, \beta_{1}\right)+\widetilde{X}_{i}\left(\beta_{1}\right) \beta_{1}
Y
i(β1)=Yi−f(Xi,β1)+X
i(β1)β1
给定初始值向量
β
i
\beta_i
βi,则
Y
~
i
(
β
1
)
\widetilde{Y}_{i}\left(\beta_{1}\right)
Y
i(β1)与
X
~
i
(
β
1
)
\widetilde{X}_{i}\left(\beta_{1}\right)
X
i(β1)可计算,从而求出最小残差平方和。
S
(
β
)
S(\beta)
S(β)对应的回归方程为
Y
~
i
(
β
1
)
=
X
~
i
(
β
)
β
+
ε
i
\widetilde{Y}_{i}\left(\beta_{1}\right)=\widetilde{X}_{i}(\beta) \beta+\varepsilon_{i}
Y
i(β1)=X
i(β)β+εi
最小二乘估计量为
β
2
=
[
X
~
(
β
1
)
′
X
~
(
β
1
)
]
−
1
X
~
(
β
1
)
′
Y
~
(
β
1
)
\beta_{2}=\left[\widetilde{X}\left(\beta_{1}\right)^{\prime} \widetilde{X}\left(\beta_{1}\right)\right]^{-1} \widetilde{X}\left(\beta_{1}\right)^{\prime} \widetilde{Y}\left(\beta_{1}\right)
β2=[X
(β1)′X
(β1)]−1X
(β1)′Y
(β1)
其中
X
~
(
β
1
)
=
[
X
~
1
(
β
1
)
⋮
X
~
n
(
β
1
)
]
,
Y
^
(
β
1
)
=
[
Y
~
1
(
β
1
)
⋮
Y
~
n
(
β
1
)
]
\widetilde{X}\left(\beta_{1}\right)=\left[\begin{array}{c} \widetilde{X}_{1}\left(\beta_{1}\right) \\ \vdots \\ \widetilde{X}_{n}\left(\beta_{1}\right) \end{array}\right], \quad \hat{Y}\left(\beta_{1}\right)=\left[\begin{array}{c} \widetilde{Y}_{1}\left(\beta_{1}\right) \\ \vdots \\ \widetilde{Y}_{n}\left(\beta_{1}\right) \end{array}\right]
X
(β1)=
X
1(β1)⋮X
n(β1)
,Y^(β1)=
Y
1(β1)⋮Y
n(β1)
此时我们求出
β
2
\beta_2
β2,再将
β
2
\beta_2
β2作为初始值依次迭代计算,得到关于向量参数
β
i
\beta_i
βi的一个序列,当且仅当
∣
∣
β
(
k
+
1
)
−
β
(
k
)
∣
∣
<
δ
||\beta^{(k+1)}-\beta^{(k)}||<\delta
∣∣β(k+1)−β(k)∣∣<δ
其中
δ
>
0
\delta>0
δ>0为事先预定的绝对误差。不难得到,参数
β
\beta
β满足递推关系
β
n
+
1
=
[
X
~
(
β
n
)
′
X
~
(
β
n
)
]
−
1
X
~
(
β
n
)
′
Y
~
(
β
n
)
=
[
X
~
(
β
n
)
′
X
~
(
β
n
)
]
−
1
X
~
(
β
n
)
′
[
Y
−
f
(
X
~
,
β
n
)
+
X
~
(
β
n
)
β
n
]
=
β
n
+
[
X
~
(
β
n
)
′
X
~
(
β
n
)
]
−
1
X
~
(
β
n
)
′
[
Y
−
f
(
X
,
β
n
)
]
\begin{aligned} \boldsymbol{\beta}_{n+1} &=\left[\widetilde{X}\left(\boldsymbol{\beta}_{n}\right)^{\prime} \widetilde{X}\left(\boldsymbol{\beta}_{n}\right)\right]^{-1} \widetilde{X}\left(\boldsymbol{\beta}_{n}\right)^{\prime} \widetilde{Y}\left(\boldsymbol{\beta}_{n}\right) \\ &=\left[\widetilde{X}\left(\boldsymbol{\beta}_{n}\right)^{\prime} \widetilde{X}\left(\boldsymbol{\beta}_{n}\right)\right]^{-1} \widetilde{X}\left(\boldsymbol{\beta}_{n}\right)^{\prime}\left[\boldsymbol{Y}-f\left(\widetilde{X}, \boldsymbol{\beta}_{n}\right)+\widetilde{X}\left(\boldsymbol{\beta}_{n}\right) \boldsymbol{\beta}_{n}\right] \\ &=\boldsymbol{\beta}_{n}+\left[\widetilde{X}\left(\boldsymbol{\beta}_{n}\right)^{\prime} \widetilde{X}\left(\boldsymbol{\beta}_{n}\right)\right]^{-1} \widetilde{X}\left(\boldsymbol{\beta}_{n}\right)^{\prime}\left[Y-f\left(X, \boldsymbol{\beta}_{n}\right)\right] \end{aligned}
βn+1=[X
(βn)′X
(βn)]−1X
(βn)′Y
(βn)=[X
(βn)′X
(βn)]−1X
(βn)′[Y−f(X
,βn)+X
(βn)βn]=βn+[X
(βn)′X
(βn)]−1X
(βn)′[Y−f(X,βn)]
通过证明,随着样本容量
n
→
∞
n\to\infty
n→∞,参数
β
\beta
β估计量服从渐进正态分布,即
β
~
∼
N
(
β
,
σ
^
2
[
X
~
(
β
)
′
X
~
(
β
)
]
−
1
)
,
σ
^
2
=
S
(
β
~
)
n
−
1
\widetilde{\beta} \sim N\left(\beta, \hat{\sigma}^{2}\left[\widetilde{X}(\beta)^{\prime} \widetilde{X}(\beta)\right]^{-1}\right), \hat{\sigma}^{2}=\frac{S(\widetilde{\beta})}{n-1}
β
∼N(β,σ^2[X
(β)′X
(β)]−1),σ^2=n−1S(β
)
3 非线性最小二乘的实现
在R语言中,可以适用nls函数实现非线性最小二乘法。以C-D函数为例,
设一国产出取决于资本、劳动与全要素的投入,即
Y
=
A
K
α
L
β
μ
Y = AK^{\alpha}L^{\beta}\mu
Y=AKαLβμ
下面通过R代码运行实现对参数
α
,
β
\alpha,\beta
α,β的估计
t = 1:12 #时间设定
Y=c(26.74, 34.81, 44.72, 57.46, 73.84, 88.45, 105.82,126.16, 150.9, 181.6, 204.3, 222.8) #产出序列
K=c(23.66,30.55,38.12,46.77,56.45,67.15,78.92,91.67,105.5, 121.3, 128.6, 132.5) #资本序列
L=c(26, 28, 32, 36, 41, 45, 48, 52, 56, 60, 66, 70) #劳动投入序列
Cdnls <- nls(Y~A*K^a*L^b,start = list(A = 0.1,a = 0.5,b = 0.5)) #非线性最小二乘,start为参数初始值向量
summary(Cdnls)
#-------------------运行结果---------------------------
#Formula: Y ~ A * K^a * L^b
Parameters:
Estimate Std. Error t value Pr(>|t|)
A 0.1129 0.0159 7.12 5.6e-05 ***
a 0.6568 0.0652 10.07 3.4e-06 ***
b 1.0298 0.1044 9.86 4.0e-06 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.7 on 9 degrees of freedom
Number of iterations to convergence: 9
Achieved convergence tolerance: 7.55e-06
结果显示,参数
α
=
0.6568
\alpha = 0.6568
α=0.6568,
β
=
1.0298
\beta = 1.0298
β=1.0298。对比直接取对数的OLS,即估计
l
n
Y
=
l
n
A
+
α
l
n
K
+
β
l
n
L
+
e
lnY = lnA+\alpha lnK+\beta lnL+e
lnY=lnA+αlnK+βlnL+e
CDlm <- lm(log(Y)~log(K)+log(L)) #对数形式
summary(CDlm)
#--------------------运行结果--------------
Call:
lm(formula = log(Y) ~ log(K) + log(L))
Residuals:
Min 1Q Median 3Q Max
-0.02714 -0.00595 -0.00118 0.00764 0.02557
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.0737 0.2355 -8.80 1.0e-05 ***
log(K) 0.6258 0.0916 6.83 7.6e-05 ***
log(L) 1.0379 0.1621 6.40 0.00012 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0173 on 9 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 0.999
F-statistic: 9.16e+03 on 2 and 9 DF, p-value: 1.29e-15
结果显示,参数
α
=
0.6268
\alpha = 0.6268
α=0.6268,
β
=
1.0379
\beta = 1.0379
β=1.0379。因此,CD函数对数化的结果回归与非线性最小二乘回归的参数基本一致。但一些不能对数化的方程,非线性最小二乘的作用更为明显。考虑真实模型
y
=
2
s
i
n
(
x
)
+
4
c
o
s
(
x
)
y = 2sin(x)+4cos(x)
y=2sin(x)+4cos(x)
接下来我们进行仿真模拟
set.seed(123) #随机种子
x <- seq(1,100,by = 0.1) #1-100,步长为0.1
e <- rnorm(length(x),0,1) #长度为序列x的长度,服从标准正态分布的误差
y <- 2*sin(x)+4*cos(x)+e #实际观测的被解释变量
plot(x,y,type = "o") #打印散点图
nls1 <- nls(y~a*sin(x)+b*cos(x),
start = list(a = 0,b =0)) #非线性最小二乘,初始值设定为0,0
nls1
#-------------运行结果------------------
Nonlinear regression model
model: y ~ a * sin(x) + b * cos(x)
data: parent.frame()
a b
1.92 4.03
residual sum-of-squares: 974
Number of iterations to convergence: 1
Achieved convergence tolerance: 6.73e-10
结果显示估计量 a = 1.92 a = 1.92 a=1.92, b = 4.03 b = 4.03 b=4.03,与总体参数 a = 2 , b = 4 a = 2,b = 4 a=2,b=4即为接近
参考文献文章来源:https://www.toymoban.com/news/detail-440166.html
王斌会(2015).计量经济学建模及R语言应用[M].北京大学出版社文章来源地址https://www.toymoban.com/news/detail-440166.html
到了这里,关于非线性最小二乘的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!