在多个方向上初始化快速对抗训练的扰动

论文标题

在多个方向上初始化快速对抗训练的扰动

Initializing Perturbations in Multiple Directions for Fast Adversarial Training

论文作者

Wang, Xunguang, Xu, Ship Peng, Wang, Eric Ke

论文摘要

深度学习提交的最新发展表明，深度神经网络（DNN）容易受到对抗性例子的影响。具体而言，在图像分类中，一个对抗性示例可以通过添加几乎看不见的扰动来清洁图像来欺骗训练有素的深神网络。对抗性训练是最直接，最有效的方法之一，它最大程度地减少了扰动数据的损失，以学习针对对抗性攻击的强大深层网络。已经证明，使用快速梯度符号方法（FGSM）可以实现快速的对抗训练。但是，由于对FGSM样本过度拟合，基于FGSM的对抗训练最终可能会获得失败的模型。在本文中，我们提出了多元化的初始化扰动对抗训练（DIP-FAT），涉及通过在随机方向上扩大目标模型的输出距离来寻求扰动的初始化。由于随机方向的多样性，使用FGSM嵌入的快速对抗训练会增加对手的信息，并减少过度拟合的可能性。除了防止过度拟合外，广泛的结果表明，我们提出的DIP-FAT技术还可以提高清洁数据的准确性。 DIPFAT方法的最大优势：在清洁数据，扰动数据和效率之间实现最佳的持续性。

Recent developments in the filed of Deep Learning have demonstrated that Deep Neural Networks(DNNs) are vulnerable to adversarial examples. Specifically, in image classification, an adversarial example can fool the well trained deep neural networks by adding barely imperceptible perturbations to clean images. Adversarial Training, one of the most direct and effective methods, minimizes the losses of perturbed-data to learn robust deep networks against adversarial attacks. It has been proven that using the fast gradient sign method (FGSM) can achieve Fast Adversarial Training. However, FGSM-based adversarial training may finally obtain a failed model because of overfitting to FGSM samples. In this paper, we proposed the Diversified Initialized Perturbations Adversarial Training (DIP-FAT) which involves seeking the initialization of the perturbation via enlarging the output distances of the target model in a random directions. Due to the diversity of random directions, the embedded fast adversarial training using FGSM increases the information from the adversary and reduces the possibility of overfitting. In addition to preventing overfitting, the extensive results show that our proposed DIP-FAT technique can also improve the accuracy of the clean data. The biggest advantage of DIP-FAT method: achieving the best banlance among clean-data, perturbed-data and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题