论文标题
几乎可以确保随机梯度方法的收敛速率
On Almost Sure Convergence Rates of Stochastic Gradient Methods
论文作者
论文摘要
文献中随机梯度方法的绝大多数收敛速率分析集中于预期收敛,而轨迹的几乎确定的收敛对于确保随机算法的任何实例化都会与概率相关。在这里,我们为随机梯度下降(SGD),随机重球(SHB)和随机Nesterov的加速梯度(SNAG)方法提供了统一的几乎确定的收敛率分析。我们首次表明,这些随机梯度方法在强凸功能上获得的几乎确定的收敛速率已任意接近其最佳收敛速率。对于非凸目标函数,我们不仅表明平方梯度规范的加权平均值几乎可以肯定地收敛到零,而且是算法的最后迭代。我们进一步为弱凸平平滑函数的随机梯度方法提供了最后的近期收敛率分析,与文献中的大多数现有结果相反,文献中仅提供了对迭代的加权平均值的预期收敛性。
The vast majority of convergence rates analysis for stochastic gradient methods in the literature focus on convergence in expectation, whereas trajectory-wise almost sure convergence is clearly important to ensure that any instantiation of the stochastic algorithms would converge with probability one. Here we provide a unified almost sure convergence rates analysis for stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods. We show, for the first time, that the almost sure convergence rates obtained for these stochastic gradient methods on strongly convex functions, are arbitrarily close to their optimal convergence rates possible. For non-convex objective functions, we not only show that a weighted average of the squared gradient norms converges to zero almost surely, but also the last iterates of the algorithms. We further provide last-iterate almost sure convergence rates analysis for stochastic gradient methods on weakly convex smooth functions, in contrast with most existing results in the literature that only provide convergence in expectation for a weighted average of the iterates.