论文标题
快速对抗训练,并增加噪音:Randstart和Gradalign的统一视角
Fast Adversarial Training with Noise Augmentation: A Unified Perspective on RandStart and GradAlign
论文作者
论文摘要
基于PGD的基于PGD和基于FGSM是两种流行的对抗训练(AT)方法,用于获得对抗性强大的模型。与基于PGD的AT相比,基于FGSM的AT速度明显更快,但由于灾难性过度拟合而失败(CO)。为了缓解如此快速的CO,有两种流行的现有策略:随机开始(Randstart)和梯度对齐(GradAlign)。前者仅适用于相对较小的扰动8/255的L_ \ Infty约束,而GradAlign通过将扰动尺寸扩展到16/255(带有L_ \ Infty约束)来改善它,但以3到4倍的成本慢了。如何快速避免使用大量的扰动大小,但在不增加计算开销的情况下仍然是一个未解决的问题,我们的工作为此提供了令人沮丧的简单(但有效)的解决方案。具体而言,我们的解决方案在于公正的噪声增强(Noiseaug),这是简化GradAlign的非平凡副产品。通过简化GradAlign,我们有两个发现:(i)在GradAlign中对齐logit而不是渐变需要一半的训练时间,但性能比GradAlign更高; (ii)也只能通过保持降噪功能(Noiseaug)来删除对齐操作。我们的Noiseaug从Gradalign简化了,与Randstart具有令人惊讶的相似之处,除了我们在图像上注入噪声而不是扰动。为了了解为什么向输入注入噪声阻止CO,我们验证这不是由数据增强效应(对图像注入噪声)而导致的,而是由改善局部线性性引起的。我们提供了一个直观的解释,说明为什么noiseaug在没有明确正规化的情况下改善了局部线性。广泛的结果表明,我们的Noiseaug实现了SOTA的结果。该代码将在接受后发布。
PGD-based and FGSM-based are two popular adversarial training (AT) approaches for obtaining adversarially robust models. Compared with PGD-based AT, FGSM-based one is significantly faster but fails with catastrophic overfitting (CO). For mitigating CO in such Fast AT, there are two popular existing strategies: random start (RandStart) and Gradient Alignment (GradAlign). The former works only for a relatively small perturbation 8/255 with the l_\infty constraint, and GradAlign improves it by extending the perturbation size to 16/255 (with the l_\infty constraint) but at the cost of being 3 to 4 times slower. How to avoid CO in Fast AT for a large perturbation size but without increasing the computation overhead remains as an unsolved issue, for which our work provides a frustratingly simple (yet effective) solution. Specifically, our solution lies in just noise augmentation (NoiseAug) which is a non-trivial byproduct of simplifying GradAlign. By simplifying GradAlign we have two findings: (i) aligning logit instead of gradient in GradAlign requires half the training time but achieves higher performance than GradAlign; (ii) the alignment operation can also be removed by only keeping noise augmentation (NoiseAug). Simplified from GradAlign, our NoiseAug has a surprising resemblance with RandStart except that we inject noise on the image instead of perturbation. To understand why injecting noise to input prevents CO, we verify that this is caused not by data augmentation effect (inject noise on image) but by improved local linearity. We provide an intuitive explanation for why NoiseAug improves local linearity without explicit regularization. Extensive results demonstrate that our NoiseAug achieves SOTA results in FGSM AT. The code will be released after accepted.