论文标题
D-Square-B:对自然攻击的深度分配
D-square-B: Deep Distribution Bound for Natural-looking Adversarial Attack
论文作者
论文摘要
我们提出了一种新型技术,可以通过分布分位数结合和多项式屏障损耗函数来界定某些深层内部激活值引起的内部激活值引起的变化,从而产生自然的对抗示例。通过界限模型内部设备而不是单独的像素,我们的攻击接收扰动与原始输入的现有特征紧密相结合,从而使生成的示例可以自然看起来很自然,同时与原始输入相比具有多样化且经常具有实质性的像素距离。执行每个神经元分布界限允许解决内部激活值的不均匀性。我们对Imagenet和五种不同模型体系结构的评估表明,我们的攻击非常有效。与最先进的像素太空攻击,语义攻击和特征空间攻击相比,我们的攻击可以达到相同的攻击成功/置信度,同时具有更自然的对抗性扰动。这些扰动背包对现有局部特征,并且没有任何固定像素边界。
We propose a novel technique that can generate natural-looking adversarial examples by bounding the variations induced for internal activation values in some deep layer(s), through a distribution quantile bound and a polynomial barrier loss function. By bounding model internals instead of individual pixels, our attack admits perturbations closely coupled with the existing features of the original input, allowing the generated examples to be natural-looking while having diverse and often substantial pixel distances from the original input. Enforcing per-neuron distribution quantile bounds allows addressing the non-uniformity of internal activation values. Our evaluation on ImageNet and five different model architecture demonstrates that our attack is quite effective. Compared to the state-of-the-art pixel space attack, semantic attack, and feature space attack, our attack can achieve the same attack success/confidence level while having much more natural-looking adversarial perturbations. These perturbations piggy-back on existing local features and do not have any fixed pixel bounds.