SGD及其连续时间的收敛速率和近似结果

论文标题

SGD及其连续时间的收敛速率和近似结果

Convergence rates and approximation results for SGD and its continuous-time counterpart

论文作者

Fontaine, Xavier, De Bortoli, Valentin, Durmus, Alain

论文摘要

本文提出了对随机梯度下降（SGD）的彻底理论分析，该分析尺寸是非侵蚀的。首先，我们表明，通过适当的耦合的时间不均匀随机微分方程（SDE），可以证明定义SGD的递归可以证明近似。在批处理噪声的具体情况下，我们使用Stein方法的最新进展来完善结果。然后，通过近期确定性和随机优化方法的持续分析，我们研究了手头连续过程的长期行为，并建立了非反应界限。为此，我们开发了具有独立兴趣的新比较技术。将这些技术调整为离散设置，我们表明相应的SGD序列相同的结果。在我们的分析中，在较弱的假设下，我们在SGD的凸设置中显着改善了与以前的工作相比，我们的非反应界限。最后，我们还在各种条件下建立了有限的时间收敛结果，包括对著名的lojasiewicz不平等的放松，可以应用于一类非凸函数。

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous Łojasiewicz inequality, which can be applied to a class of non-convex functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题