牛顿法

1. 基本原理


同样对于无约束优化问题:

$$ min \quad f(x) $$

我们求其导数的零点,就可以得到函数极值,并且若 $f''(x_{k})>0$ ,那么 $x_{k}$ 就是函数极小值点。

  1. 将目标函数在 $x_{k}$ 处进行二阶泰勒展开: $$ f(x_{k+1}) = f(x_{k}) + f'(x_{k})(x_{k+1} - x_{k}) + \frac{1}{2}f''(x_{k})(x_{k+1}-x_{k})^{2} $$
  2. 求一阶导数零点,即 $$ \begin{aligned} f'(x_{k}) + f''(x_{k})(x_{k+1}-x_{k}) &= f(x_{k+1})' \\ & = 0 \\ \end{aligned} $$
  3. 得到递增方向: $$ \begin{aligned} x_{k+1} &= x_{k} - (f''(x_{k}))^{-1}f'(x_{k}) \\ &= x_{k} - (f''(x_{k}))^{-1}g(x_{k}) \end{aligned} $$

2. 对比梯度下降法


  • 📄 梯度下降法

    1. 基本原理


    对于无约束优化问题:

    $$ min \quad f(x) $$

    将目标函数在 $x_{k}$ 处进行一阶泰勒展开

    $$ \begin{aligned} f(x_{k+1}) &= f(x_{k}) + g(x_{k})(x_{k+1} - x_{k}) \\ g(x_{k}) &= f'(x_{k}) \end{aligned} $$

    梯度下降法的目标是希望 $f(x)$ 在当前所在位置向下降量最大的方向扩展,即:

    $$ \begin{aligned} f(x_{k+1}) - f(x_{k}) &= g(x_{k}) \cdot \alpha \vec{v} \\ & = \alpha \times ||g(x_{k})|| \times ||\vec{v}|| \times cos(\theta) \end{aligned} $$

    所以要使得 $f(x_{k+1}) - f(x_{k})$ 的绝对值最大,并且为负,需要 $cos(\theta) = -1$,即 $\vec{v} = -g(x_{k})$,所以有:

    $$ x_{k+1} = x_{k} - \alpha \times g(x_{k}) $$

    2. 局限性


    • 在靠近极值点时更新步长会变得很小,由此造成迭代次数过大。

    3. 参考


    : $$ x_{k+1} = x_{k} - \alpha \times g(x_{k}) $$
  • 牛顿法: $$ x_{k+1} = x_{k} - (f''(x_{k}))^{-1}g(x_{k}) $$

所以牛顿法其实是通过海塞矩阵求取了一个步长,其迭代速度相比梯度下降法会快很多,但是海塞矩阵的求解与求逆却相对要难很多。

3. 局限性


  • 要求给定的方程需要二阶可导
  • 非凸函数的海塞矩阵不一定有逆
  • 数据较大的时候,海塞矩阵的计算量偏大

4. 参考