受约束域中的对抗示例

论文标题

受约束域中的对抗示例

Adversarial Examples in Constrained Domains

论文作者

Sheatsley, Ryan, Papernot, Nicolas, Weisman, Michael, Verma, Gunjan, McDaniel, Patrick

论文摘要

机器学习算法已被证明通过系统修改（例如，图像识别）中的输入（例如，对抗性示例）的系统修改（例如，对抗性示例）很容易受到对抗操作的影响。在默认威胁模型下，对手利用了图像的无约束性质。每个功能（像素）完全由对手控制。但是，尚不清楚这些攻击如何转化为限制对手可以修改特征以及如何修改特征的约束域（例如，网络入侵检测）。在本文中，我们探讨了受约束的域是否比无约束的域不太易受攻击示例生成算法。我们创建了一种用于生成对抗草图的算法：针对性的通用扰动向量，该向量在域约束的信封内编码特征显着性。为了评估这些算法的性能，我们在受约束（例如网络入侵检测）和不受约束（例如图像识别）域中评估它们。结果表明，我们的方法在约束域中产生错误分类率，这些域与不受约束的域（大于95％）相当。我们的调查表明，受约束域暴露的狭窄攻击表面仍然足够大，可以制作成功的对抗例子。因此，约束似乎并不能使域变得健壮。实际上，只有五个随机选择的功能，仍然可以生成对抗性示例。

Machine learning algorithms have been shown to be vulnerable to adversarial manipulation through systematic modification of inputs (e.g., adversarial examples) in domains such as image recognition. Under the default threat model, the adversary exploits the unconstrained nature of images; each feature (pixel) is fully under control of the adversary. However, it is not clear how these attacks translate to constrained domains that limit which and how features can be modified by the adversary (e.g., network intrusion detection). In this paper, we explore whether constrained domains are less vulnerable than unconstrained domains to adversarial example generation algorithms. We create an algorithm for generating adversarial sketches: targeted universal perturbation vectors which encode feature saliency within the envelope of domain constraints. To assess how these algorithms perform, we evaluate them in constrained (e.g., network intrusion detection) and unconstrained (e.g., image recognition) domains. The results demonstrate that our approaches generate misclassification rates in constrained domains that were comparable to those of unconstrained domains (greater than 95%). Our investigation shows that the narrow attack surface exposed by constrained domains is still sufficiently large to craft successful adversarial examples; and thus, constraints do not appear to make a domain robust. Indeed, with as little as five randomly selected features, one can still generate adversarial examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题