通过连续时间系统的非convex-nonconcave minimax优化的限制行为

论文标题

通过连续时间系统的非convex-nonconcave minimax优化的限制行为

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

论文作者

Grimmer, Benjamin, Lu, Haihao, Worah, Pratik, Mirrokni, Vahab

论文摘要

与非convex优化不同，在保证梯度下降到局部优化器的情况下，非Convex-Nonconcave minimax优化的算法可以具有拓扑上不同的解决方案路径：有时会收敛到解决方案，有时永远不会收敛，而是遵循限制周期，有时会偏离。在本文中，我们研究了三种经典最小算法的限制行为：梯度下降（GDA），交替的梯度下降（AGDA）和外部方法（EGM）。从数值上讲，我们观察到所有这些限制行为都可能在生成的对抗网络（GAN）训练中出现，并且很容易就一系列GAN问题证明。为了解释这些不同的行为，我们研究了与每种算法相对应的高阶连续时间动力学，从而导致每种方法的局部收敛性的足够（几乎是必要的）条件。此外，这种ODE的观点使我们能够表征由于将正则化作为HOPF分叉引起的这些不同限制行为之间的相变。

Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题