牛顿法
1. 基本原理
同样对于无约束优化问题:
$$ min \quad f(x) $$我们求其导数的零点,就可以得到函数极值,并且若 $f''(x_{k})>0$ ,那么 $x_{k}$ 就是函数极小值点。
- 将目标函数在 $x_{k}$ 处进行二阶泰勒展开: $$ f(x_{k+1}) = f(x_{k}) + f'(x_{k})(x_{k+1} - x_{k}) + \frac{1}{2}f''(x_{k})(x_{k+1}-x_{k})^{2} $$
- 求一阶导数零点,即 $$ \begin{aligned} f'(x_{k}) + f''(x_{k})(x_{k+1}-x_{k}) &= f(x_{k+1})' \\ & = 0 \\ \end{aligned} $$
- 得到递增方向: $$ \begin{aligned} x_{k+1} &= x_{k} - (f''(x_{k}))^{-1}f'(x_{k}) \\ &= x_{k} - (f''(x_{k}))^{-1}g(x_{k}) \end{aligned} $$
2. 对比梯度下降法
-
📄 梯度下降法: $$ x_{k+1} = x_{k} - \alpha \times g(x_{k}) $$
1. 基本原理
对于无约束优化问题:
$$ min \quad f(x) $$将目标函数在 $x_{k}$ 处进行一阶泰勒展开
$$ \begin{aligned} f(x_{k+1}) &= f(x_{k}) + g(x_{k})(x_{k+1} - x_{k}) \\ g(x_{k}) &= f'(x_{k}) \end{aligned} $$梯度下降法的目标是希望 $f(x)$ 在当前所在位置向下降量最大的方向扩展,即:
$$ \begin{aligned} f(x_{k+1}) - f(x_{k}) &= g(x_{k}) \cdot \alpha \vec{v} \\ & = \alpha \times ||g(x_{k})|| \times ||\vec{v}|| \times cos(\theta) \end{aligned} $$所以要使得 $f(x_{k+1}) - f(x_{k})$ 的绝对值最大,并且为负,需要 $cos(\theta) = -1$,即 $\vec{v} = -g(x_{k})$,所以有:
$$ x_{k+1} = x_{k} - \alpha \times g(x_{k}) $$2. 局限性
- 在靠近极值点时更新步长会变得很小,由此造成迭代次数过大。
3. 参考
- 牛顿法: $$ x_{k+1} = x_{k} - (f''(x_{k}))^{-1}g(x_{k}) $$
所以牛顿法其实是通过海塞矩阵求取了一个步长,其迭代速度相比梯度下降法会快很多,但是海塞矩阵的求解与求逆却相对要难很多。
3. 局限性
- 要求给定的方程需要二阶可导
- 非凸函数的海塞矩阵不一定有逆
- 数据较大的时候,海塞矩阵的计算量偏大