论文标题

关于贝叶斯神经网络的稳健性

On the Robustness of Bayesian Neural Networks to Adversarial Attacks

论文作者

Bortolussi, Luca, Carbone, Ginevra, Laurenti, Luca, Patane, Andrea, Sanguinetti, Guido, Wicker, Matthew

论文摘要

对对抗攻击的脆弱性是在安全至关重要应用中采用深度学习的主要障碍之一。尽管做出了巨大的努力,但无论是实用和理论上的,培训深度学习模型对对抗性攻击仍然是一个开放的问题。在本文中,我们分析了大数据,贝叶斯神经网络(BNN)中的对抗性攻击的几何形状。我们表明,在限制下,由于数据分布的变性而导致基于梯度的攻击的脆弱性,即当数据位于环境空间的较低维度的亚策略上时。直接结果,我们证明,在此限制下,BNN后代对基于梯度的对抗攻击是强大的。至关重要的是,我们证明,即使从后部采样的每个神经网络都容易受到基于梯度的攻击,因此相对于BNN后分布的预期损失梯度正在消失。 MNIST,时尚MNIST和半卫星数据集的实验结果,代表有限的数据制度,对BNN进行了接受汉密尔顿蒙特卡洛和变异推理的BNN,支持了这一参数,表明BNN可以在清洁数据和强大的基于梯度的基于梯度和基于梯度的基于梯度和基于梯度的基于梯度和基于梯度的攻击中表现出很高的准确性。

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, training deep learning models robust to adversarial attacks is still an open problem. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparameterized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in this limit BNN posteriors are robust to gradient-based adversarial attacks. Crucially, we prove that the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each neural network sampled from the posterior is vulnerable to gradient-based attacks. Experimental results on the MNIST, Fashion MNIST, and half moons datasets, representing the finite data regime, with BNNs trained with Hamiltonian Monte Carlo and Variational Inference, support this line of arguments, showing that BNNs can display both high accuracy on clean data and robustness to both gradient-based and gradient-free based adversarial attacks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源