高斯牛顿法
1. 最小二乘问题
- 线性最小二乘
用直线 $y = \theta^{T} x, x\in R^{n}, y\in R$ 去拟合给定的输入输出数据,有 m 组数据,则构建残差模型:
$$ f(\theta) = \frac{1}{2} ||A\theta-b||^{2} $$其中
$$ \begin{aligned} A = [x_{1},x_{2},x_{3}...x_{m}]^{T} \quad x_{i} \in R^{n}\\ b = [y_{1},y_{2},y_{3}...y_{m}]^{T} \quad y_{i} \in R \end{aligned} $$线性最小二乘求解,直接对 $f(\theta)$ 求导,令导数为0,则可以得到:
$$ \begin{aligned} \frac{\partial f(\theta)}{\partial \theta} &= A^{T}(A\theta-b) \\ & = 0 \\ \end{aligned} $$所以直接求解得到:
$$ \theta = (A^{T}A)^{-1}A^{T}b $$- 非线性最小二乘问题
对于目标函数:
$$ F(x) = \frac{1}{2} ||f(x)||^{2} = \frac{1}{2}f^{T}(x)f(x), \quad f(x) = [f_{1}(x),f_{2}(x) ... f_{m}(x)]^{T}, \quad x\in R^{n} $$将 $f(x)$ 在 $x_{k}$ 处进行一阶泰勒展开,得到:
$$ f(x_{k+1}) = f(x_{k}) + J(x_{k}) (x_{k+1}-x_{k}) $$注意,这里的 $J(x_{k})$ 是 $f(x)$ 的一阶导数,而不是 $F(x)$ 的一阶导数。则:
$$ \begin{aligned} \frac{\partial F(x_{k+1})}{\partial x_{k+1}} &= \frac{1}{2}f^{T}(x_{k+1})J(x_{k}) + \frac{1}{2} J^{T}(x_{k})f(x_{k+1}) \\ &=J^{T}(x_{k})f(x_{k}) + J^{T}(x_{k})J(x_{k})(x_{k+1} - x_{k}) \end{aligned} $$按照高斯牛顿法的思想,需要一阶导数为零,则可以得到迭代时的递增方式:
$$ x_{k+1} = x_{k} - (J^{T}(x_{k})J(x_{k}))^{-1}J^{T}(x_{k})f(x_{k}) $$而原始目标函数 $F(x)$ 的一阶导数:
$$ \begin{aligned} g(x) &= F'(x) \\ &= \frac{1}{2}f^{T}(x)J(x) + \frac{1}{2}J^{T}(x)f(x) \\ &= J^{T}f(x) \end{aligned} $$所以高斯牛顿法的步数递增可以表示为:
$$ x_{k+1} = x_{k} - (J^{T}(x_{k})J(x_{k}))^{-1}g(x_{k}) $$注意和牛顿法进行对比:牛顿法 > 2. 对比 梯度下降法
可以看出,高斯牛顿法就是使用 $J^{T}(x_{k})J(x_{k})$ 来对原始目标函数 $F(x_{k})$ 的海塞矩阵 $F''(x_{k})$ 进行了近似。
2. 高斯牛顿法局限性
- $J^{T}(x_{k})J(x_{k})$ 只具备半正定性质,但可能求解出来是奇异矩阵,导致求逆失败,从而计算下一步位置 $x_{k+1}$ 失败;
- 当计算得到的 $x_{k+1}$ 相对 $x_{k}$ 的变化太大,从而用二阶展开无法近似函数,导致收敛失败。