通过相互信息估计改善对抗性鲁棒性

论文标题

通过相互信息估计改善对抗性鲁棒性

Improving Adversarial Robustness via Mutual Information Estimation

论文作者

Zhou, Dawei, Wang, Nannan, Gao, Xinbo, Han, Bo, Wang, Xiaoyu, Zhan, Yibing, Liu, Tongliang

论文摘要

发现深神经网络（DNN）容易受到对抗噪声的影响。它们通常被对抗样本误导，以做出错误的预测。为了减轻这种负面影响，在本文中，我们从信息理论的角度研究了目标模型的产出与输入对抗样本之间的依赖性，并提出了一种对抗性防御方法。具体而言，我们首先通过估计输入和自然模式之间的相互信息（MI）（称为天然MI）以及分别在输出和输入的对抗模式之间（称为对抗MI）之间的依赖性。我们发现，与W.R.T.相比，对抗样品通常具有更大的对抗性MI和较小的天然MI。天然样品。在这一观察过程中，我们建议通过在训练过程中最大化天然MI并最大程度地减少对抗性MI来增强对抗性鲁棒性。通过这种方式，预期目标模型将更多地关注包含客观语义的自然模式。经验评估表明，我们的方法可以有效地提高针对多种攻击的对抗性准确性。

Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题