最佳多阶段随机梯度方法，用于最小值问题

论文标题

最佳多阶段随机梯度方法，用于最小值问题

An Optimal Multistage Stochastic Gradient Method for Minimax Problems

论文作者

Fallah, Alireza, Ozdaglar, Asuman, Pattathil, Sarath

论文摘要

在本文中，当我们可以访问梯度的嘈杂估计值时，我们研究了平滑且强烈凸出的凹面环境中的最小值优化问题。特别是，我们首先以恒定步骤分析了随机梯度下降（GDA）方法，并证明它会收敛到Minimax问题解决方案的邻域。我们进一步提供了有关该社区的收敛速度和大小的紧密界限。接下来，我们提出了随机GDA（M-GDA）的多阶段变体，该变体在多个阶段运行，具有特定的学习率衰减时间表，并收敛到Minimax问题的精确解决方案。我们显示M-GDA在噪声依赖方面达到了下限，而没有对噪声特征知识的任何假设。我们还表明，M-GDA在误差对初始误差的依赖性方面获得了线性衰减率，尽管对条件号的依赖性是次优的。为了提高这种依赖性，我们将多阶段机械应用于随机乐观的梯度下降（OGDA）算法，并提出了M-OGDA算法，该算法还可以达到相对于初始错误的最佳线性衰减速率。据我们所知，这种方法是第一个同时实现对噪声特征以及初始误差和条件编号的最佳依赖性。

In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the convergence rate and the size of this neighborhood. Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem. We show M-GDA achieves the lower bounds in terms of noise dependence without any assumptions on the knowledge of noise characteristics. We also show that M-GDA obtains a linear decay rate with respect to the error's dependence on the initial error, although the dependence on condition number is suboptimal. In order to improve this dependence, we apply the multistage machinery to the stochastic Optimistic Gradient Descent Ascent (OGDA) algorithm and propose the M-OGDA algorithm which also achieves the optimal linear decay rate with respect to the initial error. To the best of our knowledge, this method is the first to simultaneously achieve the best dependence on noise characteristic as well as the initial error and condition number.

下载PDF全文

下载文献需遵守相关版权规定

论文标题