面向特征空间对抗攻击

论文标题

面向特征空间对抗攻击

Towards Feature Space Adversarial Attack

论文作者

Xu, Qiuling, Tao, Guanhong, Cheng, Siyuan, Zhang, Xiangyu

论文摘要

我们向深层神经网络提出了新的对抗性攻击，以进行图像分类。与直接扰动输入像素的大多数现有攻击不同，我们的攻击集中在扰动抽象功能上，更具体地说，是表示样式的特征，包括可解释的样式，例如生动的颜色和鲜明的轮廓以及无法解释的。它通过通过优化过程注入不可察觉的样式来引起模型错误分类。我们表明，我们的攻击可以产生比最新的无限攻击更自然的对抗样本。该实验还支持现有的像素空间对抗攻击检测和防御技术几乎无法确保与样式相关的特征空间的稳健性。

We propose a new adversarial attack to Deep Neural Networks for image classification. Different from most existing attacks that directly perturb input pixels, our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by injecting imperceptible style changes through an optimization procedure. We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art unbounded attacks. The experiment also supports that existing pixel-space adversarial attack detection and defense techniques can hardly ensure robustness in the style related feature space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题