Apollo：一种自适应参数的对角线准牛顿方法，用于非凸随机优化

论文标题

Apollo：一种自适应参数的对角线准牛顿方法，用于非凸随机优化

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

论文作者

Ma, Xuezhe

论文摘要

在本文中，我们介绍了Apollo，这是一种用于非凸随机优化的准Newton方法，该方法通过对角矩阵近似Hessian，动态地结合了损耗函数的曲率。重要的是，Hessian的对角线近似的更新和存储与具有线性复杂性的自适应一阶优化方法一样有效。为了处理非概念性，我们用其校正的绝对值代替了黑森州，这可以保证为正排定。关于三个视力和语言任务的实验表明，就收敛速度和概括性能而言，阿波罗比其他随机优化方法（包括SGD和Adam的变体）取得了重大改进。该算法的实现可在https://github.com/xuezhemax/apollo上获得。

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Experiments on three tasks of vision and language show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance. The implementation of the algorithm is available at https://github.com/XuezheMax/apollo.

下载PDF全文

下载文献需遵守相关版权规定

论文标题