论文标题
Wemix:如何更好地利用数据增加
WeMix: How to Better Utilize Data Augmentation
论文作者
论文摘要
数据增强是深度学习中广泛使用的培训技巧,以提高网络概括能力。尽管结果有很多令人鼓舞的结果,但最近的一些研究确实指出了在某些情况下传统数据增强方案的局限性,呼吁对数据增强有更好的理论理解。在这项工作中,我们制定了一项综合分析,揭示了数据增强的利弊。数据增强的主要局限性来自数据偏差,即增强数据分布可能与原始数据差异很大。这种数据偏置导致现有数据增强方法的次优性能。为此,我们开发了两种称为“ AugDrop”和“ Mixloss”的新型算法,以纠正数据增强中的数据偏差。我们的理论分析表明,这两种算法都可以保证通过偏见校正来改善数据增强的效果,这通过我们的经验研究进一步验证。最后,我们通过结合AugDrop和Mixloss提出了一种通用算法“ Wemix”,从广泛的经验评估中可以观察到其有效性。
Data augmentation is a widely used training trick in deep learning to improve the network generalization ability. Despite many encouraging results, several recent studies did point out limitations of the conventional data augmentation scheme in certain scenarios, calling for a better theoretical understanding of data augmentation. In this work, we develop a comprehensive analysis that reveals pros and cons of data augmentation. The main limitation of data augmentation arises from the data bias, i.e. the augmented data distribution can be quite different from the original one. This data bias leads to a suboptimal performance of existing data augmentation methods. To this end, we develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation. Our theoretical analysis shows that both algorithms are guaranteed to improve the effect of data augmentation through the bias correction, which is further validated by our empirical studies. Finally, we propose a generic algorithm "WeMix" by combining AugDrop and MixLoss, whose effectiveness is observed from extensive empirical evaluations.