论文标题
单步对抗训练的正规化器
Regularizers for Single-step Adversarial Training
论文作者
论文摘要
在过去的十年中,取得的进步使机器学习模型能够在计算机视觉中的各种任务中实现令人印象深刻的性能。但是,大量作品证明了这些模型对对抗样品的敏感性。已经提出了对抗性训练程序,以防止这种对抗性攻击。对抗性训练方法增加了具有对抗样本的迷你批次,通常使用单步(非词)方法来生成这些对抗性样本。但是,使用单步对抗训练训练的模型将模型仅仅是强大的模型收敛到退化最小值。这些模型的伪鲁棒性是由于梯度掩盖效应所致。尽管多步对抗训练有助于学习强大的模型,但由于使用迭代方法来生成对抗样本,因此很难扩展。为了解决这些问题,我们提出了三种不同类型的正规化器,可以使用单步对抗训练方法来学习健壮的模型。提出的正规化器通过利用可靠模型与伪鲁棒模型的属性来减轻梯度掩盖的效果。使用所提出的正规化器训练的模型的性能与使用计算昂贵的多步对抗训练方法训练的模型相当。
The progress in the last decade has enabled machine learning models to achieve impressive performance across a wide range of tasks in Computer Vision. However, a plethora of works have demonstrated the susceptibility of these models to adversarial samples. Adversarial training procedure has been proposed to defend against such adversarial attacks. Adversarial training methods augment mini-batches with adversarial samples, and typically single-step (non-iterative) methods are used for generating these adversarial samples. However, models trained using single-step adversarial training converge to degenerative minima where the model merely appears to be robust. The pseudo robustness of these models is due to the gradient masking effect. Although multi-step adversarial training helps to learn robust models, they are hard to scale due to the use of iterative methods for generating adversarial samples. To address these issues, we propose three different types of regularizers that help to learn robust models using single-step adversarial training methods. The proposed regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model. Performance of models trained using the proposed regularizers is on par with models trained using computationally expensive multi-step adversarial training methods.