贝叶斯学习带有信息获得的贝叶斯学习证明是强大的对抗防御的风险

论文标题

贝叶斯学习带有信息获得的贝叶斯学习证明是强大的对抗防御的风险

Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

论文作者

Doan, Bao Gia, Abbasnejad, Ehsan, Shi, Javen Qinfeng, Ranasinghe, Damith C.

论文摘要

我们提出了一种新的算法，以了解对敌对攻击的深度神经网络模型。以前的算法表明，受对抗训练的贝叶斯神经网络（BNN）提供了改善的鲁棒性。我们认识到近似贝叶斯模型多模式后分布的对抗性学习方法可能导致模式崩溃。因此，模型在鲁棒性和性能方面的成就是最佳的。取而代之的是，我们首先提出防止模式崩溃以更好地近似多模式后部分布。其次，基于直觉，强大的模型应忽略扰动，而仅考虑输入的内容内容，我们概念化和制定信息增益目标，以衡量和迫使从良性和对抗性训练实例中学习的信息相似。重要的是。我们证明并证明，将信息增益目标最小化可以使对抗性风险应对常规的经验风险。我们认为，我们的努力为对对手训练BNN的原则方法提供了迈出的一步。与对抗性训练和PGD攻击下的ADV-BNN相比，CIFAR-10和STL-10数据集在PGD攻击下，与对抗性训练和ADV-BNN相比，我们的模型显示出明显提高的鲁棒性（至20％）。

We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题