论文标题
提高白盒对抗攻击的梯度
Boosting Gradient for White-Box Adversarial Attacks
论文作者
论文摘要
深度神经网络(DNN)在各种人工智能应用中扮演关键角色,例如图像分类和对象识别。但是,越来越多的研究表明,DNN中存在对抗性示例,这些示例与原始样本几乎没有明显不同,但可以大大改变网络输出。现有的白框攻击算法可以生成强大的对抗示例。然而,大多数算法都集中在如何迭代地充分利用梯度来改善对抗性的性能。相比之下,在本文中,我们关注广泛使用的Relu激活函数的特性,并发现存在两个现象(即错误的阻塞和过度传播)误导了反向传播过程中Relu中梯度的计算。这两个问题都扩大了损失函数从梯度和相应的实际变化的预测变化之间的差异,并误导了导致较大扰动的梯度。因此,我们提出了一种称为Adv-Relu的通用对抗示例生成方法,以增强基于梯度的白框攻击算法的性能。在网络的反向传播过程中,我们的方法计算损失函数与网络输入的梯度,将值映射到得分,然后选择其中一部分以更新误导性梯度。 \ emph {Imagenet}的全面实验结果表明,我们的Adv-Relu可以轻松地集成到许多基于梯度的最新梯度的白盒攻击算法中,并转移到Black-Box攻击者中,以进一步降低$ {\ ell el _2} $ - norm-narm的$ {\ ell el _2} $ - norm-narm。
Deep neural networks (DNNs) are playing key roles in various artificial intelligence applications such as image classification and object recognition. However, a growing number of studies have shown that there exist adversarial examples in DNNs, which are almost imperceptibly different from original samples, but can greatly change the network output. Existing white-box attack algorithms can generate powerful adversarial examples. Nevertheless, most of the algorithms concentrate on how to iteratively make the best use of gradients to improve adversarial performance. In contrast, in this paper, we focus on the properties of the widely-used ReLU activation function, and discover that there exist two phenomena (i.e., wrong blocking and over transmission) misleading the calculation of gradients in ReLU during the backpropagation. Both issues enlarge the difference between the predicted changes of the loss function from gradient and corresponding actual changes, and mislead the gradients which results in larger perturbations. Therefore, we propose a universal adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient based white-box attack algorithms. During the backpropagation of the network, our approach calculates the gradient of the loss function versus network input, maps the values to scores, and selects a part of them to update the misleading gradients. Comprehensive experimental results on \emph{ImageNet} demonstrate that our ADV-ReLU can be easily integrated into many state-of-the-art gradient-based white-box attack algorithms, as well as transferred to black-box attack attackers, to further decrease perturbations in the ${\ell _2}$-norm.