动量的随机优化：收敛，波动和避免陷阱

论文标题

动量的随机优化：收敛，波动和避免陷阱

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

论文作者

Barakat, A., Bianchi, P., Hachem, W., Schechtman, Sh.

论文摘要

在本文中，研究了一般的随机优化程序，统一了随机梯度下降的几种变体，例如随机重球方法，随机的Nesterov加速梯度算法（S-NAG），以及广泛使用的ADAM算法。该算法被视为非自治的普通微分方程的嘈杂的Euler离散化，该方程最近由Belotto da Silva和Gazeau引入，对此进行了深入分析。假设目标函数是非凸且可区分的，则建立了迭代率与临界点集的稳定性和几乎确定的收敛性。值得注意的特殊情况是在非凸环设置中S-NAG的收敛证明。在某些假设下，收敛速率以中央限制定理的形式提供。最后，建立了算法与不想要的关键点（例如局部最大值或马鞍点）的不相关性。在这里，主要成分是对非自治环境产生的陷阱的新避免，这是独立的。

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题