论文标题
关于GAN培训的数据增强
On Data Augmentation for GAN Training
论文作者
论文摘要
生成对抗网络(GAN)的最新成功肯定了在GAN培训中使用更多数据的重要性。然而,在许多领域(例如医疗应用程序)中收集数据是昂贵的。数据增强(DA)已在这些应用中应用。在这项工作中,我们首先认为经典的DA方法可能会误导生成器以了解增强数据的分布,这可能与原始数据不同。然后,我们提出了一个原则性的框架,称为GAN(DAG)优化的数据增强,以实现在GAN培训中使用增强数据以改善原始分布的学习。我们提供了理论分析,以表明,使用我们提出的DAG与原始GAN保持一致,以最大程度地减少原始分布和模型分布之间的Jensen-Shannon(JS)差异。重要的是,提议的DAG有效地利用了增强数据来改善歧视者和发电机的学习。我们使用自然图像和医学图像的数据集进行实验,以将DAG应用于不同的GAN模型:无条件的GAN,有条件的GAN,有条件的GAN,自我监管的Gan和Cyclean。结果表明,DAG在这些模型中实现了一致且相当大的改进。此外,当在某些GAN模型中使用DAG时,系统会建立最新的特征构成距离(FID)分数。我们的代码可用。
Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance of using more data in GAN training. Yet it is expensive to collect data in many domains such as medical applications. Data Augmentation (DA) has been applied in these applications. In this work, we first argue that the classical DA approach could mislead the generator to learn the distribution of the augmented data, which could be different from that of the original data. We then propose a principled framework, termed Data Augmentation Optimized for GAN (DAG), to enable the use of augmented data in GAN training to improve the learning of the original distribution. We provide theoretical analysis to show that using our proposed DAG aligns with the original GAN in minimizing the Jensen-Shannon (JS) divergence between the original distribution and model distribution. Importantly, the proposed DAG effectively leverages the augmented data to improve the learning of discriminator and generator. We conduct experiments to apply DAG to different GAN models: unconditional GAN, conditional GAN, self-supervised GAN and CycleGAN using datasets of natural images and medical images. The results show that DAG achieves consistent and considerable improvements across these models. Furthermore, when DAG is used in some GAN models, the system establishes state-of-the-art Frechet Inception Distance (FID) scores. Our code is available.