论文标题
非Convex-Nonconcave minimax优化的通用梯度下降升天方法
Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization
论文作者
论文摘要
NonConvex-Nonconcave Minimax优化在过去十年中由于其在机器学习中的广泛应用而受到了密切的关注。大多数现有的算法依赖于单方面的信息,例如原始功能的凸(分别凹形)或其他特定结构,例如Polyak-dourakdous-lojasiewicz(Pł)和kurdyka-kurdyka-lojousiewicziewicz(kł)条件。但是,在实践中验证这些规律性条件具有挑战性。为了应对这一挑战,我们提出了一种新颖的普遍适用的单环算法,即双平滑的梯度下降方法(DS-GDA),它自然地平衡了原始和双重更新。也就是说,具有相同超级参数的DS-GDA能够均匀地求解具有单侧kł属性的非convex-concave,convex-nonconcave和nonconvex-nonconcave问题,从而使用$ \ natercal {o}(O}(O}(O})(ε^{-4})$复合物。当已知Kł指数时,可以获得更清晰(甚至最佳的)迭代复杂性。具体而言,在(0,1)$ in(0,1)$中的指数$θ\的单方面Kł条件下,DS-GDA以$ \ MATHCAL {o}的迭代复杂度收敛(ε^{ - 2 \ max \ {2θ,1 \}})$。它们都与文献中相应的最佳结果匹配。此外,我们表明DS-GDA实际上适用于一般的非Convex-Nonconcave问题,即使没有任何规律性条件,例如Pł条件,Kł条件或弱薄荷变化不平等状况。对于文献中的各种具有挑战性的非convex-nonconcave示例,包括``被抛弃'',``双线性耦合的minimax'',``六阶多项式''和``polar''',``polar''',提出的ds-gda都可以摆脱限制的限制周期。据我们所知,这是第一阶算法在所有这些强大的问题上实现融合。
Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided KŁ properties, achieving convergence with $\mathcal{O}(ε^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the KŁ exponent is known. Specifically, under the one-sided KŁ condition with exponent $θ\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(ε^{-2\max\{2θ,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the PŁ condition, KŁ condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems.