论文标题

动量的随机优化:收敛,波动和避免陷阱

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

论文作者

Barakat, A., Bianchi, P., Hachem, W., Schechtman, Sh.

论文摘要

在本文中,研究了一般的随机优化程序,统一了随机梯度下降的几种变体,例如随机重球方法,随机的Nesterov加速梯度算法(S-NAG),以及广泛使用的ADAM算法。该算法被视为非自治的普通微分方程的嘈杂的Euler离散化,该方程最近由Belotto da Silva和Gazeau引入,对此进行了深入分析。假设目标函数是非凸且可区分的,则建立了迭代率与临界点集的稳定性和几乎确定的收敛性。值得注意的特殊情况是在非凸环设置中S-NAG的收敛证明。在某些假设下,收敛速率以中央限制定理的形式提供。最后,建立了算法与不想要的关键点(例如局部最大值或马鞍点)的不相关性。在这里,主要成分是对非自治环境产生的陷阱的新避免,这是独立的。

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源