论文标题
语义保护对抗性训练
Semantics-Preserving Adversarial Training
论文作者
论文摘要
对抗训练是一种防御技术,可以通过在训练数据中包括对抗性示例来改善深神经网络(DNN)的对抗性鲁棒性。在本文中,我们确定了一个被忽视的对抗训练问题,因为这些对抗性示例通常具有与原始数据不同的语义,并将意想不到的偏见引入模型中。我们假设这样的非智力保护(结果是模棱两可的)对抗性数据损害了目标模型的鲁棒性。为了减轻对抗性示例的这种意外语义变化,我们提出了语义传播对抗性训练(SPAT),这鼓励在训练阶段产生对抗性示例时在所有类中共享的像素上的扰动。实验结果表明,吐温可以改善对抗性的鲁棒性,并在CIFAR-10和CIFAR-100中获得最先进的结果。
Adversarial training is a defense technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data. In this paper, we identify an overlooked problem of adversarial training in that these adversarial examples often have different semantics than the original data, introducing unintended biases into the model. We hypothesize that such non-semantics-preserving (and resultingly ambiguous) adversarial data harm the robustness of the target models. To mitigate such unintended semantic changes of adversarial examples, we propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes when generating adversarial examples in the training stage. Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10 and CIFAR-100.