随机梯度兰格文动力学的收敛速率提高了方差降低及其在优化中的应用

论文标题

随机梯度兰格文动力学的收敛速率提高了方差降低及其在优化中的应用

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

论文作者

Kinoshita, Yuri, Suzuki, Taiji

论文摘要

随机梯度Langevin Dynamics是解决采样问题的最基本算法之一，并且在几种机器学习应用中出现了非凸优化。尤其是，如今的差异降低版本引起了人们的特别关注。在本文中，我们研究了这种两种变体，即随机方差降低了梯度langevin动力学和随机递归梯度langevin动力学。我们证明了它们的融合，从唯一的平滑度和对数 - 苏布尔夫不等式的假设下，它们的kl差异比与这些算法的先前作品中使用的条件更弱。批处理大小和内部环长度设置为$ \ sqrt {n} $，实现$ε$ -precision的梯度复杂度为$ \ tilde {o}（（（n+dn^{1/2} {1/2}ε^{ - 1}ε^{ - 1}）γ^2 l^2 l^2 l^2 l^2 l^2 l^2 l^2 l^{ - 2} $ contrans antry vony vony，以上是一个分析。我们还向非凸优化显示了结果的一些基本应用。

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $ε$-precision is $\tilde{O}((n+dn^{1/2}ε^{-1})γ^2 L^2α^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题