论文标题

gan中SGD的收敛性和样品复杂性

Convergence and Sample Complexity of SGD in GANs

论文作者

Kontonis, Vasilis, Liu, Sihan, Tzamos, Christos

论文摘要

我们通过SGD提供理论融合保证了培训生成对抗网络(GAN)。我们考虑学习由1层生成器网络建模的目标分布,该网络具有非线性激活函数$ ϕ(\ cdot)$,由a $ d \ times d $ d $ strix $ \ mathbf w _*$,i.e. 我们的主要结果是,通过Goodfellow等人提出的随机梯度下降迭代迭代,训练发电机与鉴别器一起训练。产生一个接近$ f _*$的目标分布的发电机分布。具体而言,我们可以使用$ \ tilde o(d^2/ε^2)$样本在总变量距离$ε$中学习目标分布,这是(近)信息从理论上讲最佳。 我们的结果适用于一类广泛的非线性激活函数$ ϕ $,包括Relus,并通过与截短统计信息的连接和歧视网络的适当设计来启用。我们的方法依赖于双重优化框架来表明Vanilla SGDA有效。

We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $ϕ(\cdot)$ parametrized by a $d \times d$ weight matrix $\mathbf W_*$, i.e., $f_*(\mathbf x) = ϕ(\mathbf W_* \mathbf x)$. Our main result is that by training the Generator together with a Discriminator according to the Stochastic Gradient Descent-Ascent iteration proposed by Goodfellow et al. yields a Generator distribution that approaches the target distribution of $f_*$. Specifically, we can learn the target distribution within total-variation distance $ε$ using $\tilde O(d^2/ε^2)$ samples which is (near-)information theoretically optimal. Our results apply to a broad class of non-linear activation functions $ϕ$, including ReLUs and is enabled by a connection with truncated statistics and an appropriate design of the Discriminator network. Our approach relies on a bilevel optimization framework to show that vanilla SGDA works.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源