潜在风格转换的对抗防御

论文标题

潜在风格转换的对抗防御

Adversarial Defense by Latent Style Transformations

论文作者

Wang, Shuo, Nepal, Surya, Abuadbba, Alsharif, Rudolph, Carsten, Grobler, Marthie

论文摘要

机器学习模型已经证明了对对抗性攻击的脆弱性，更具体地说是对对抗性示例的错误分类。在本文中，我们通过检测可疑输入来调查针对对高分辨率图像的对抗性攻击的攻击性防御。我们方法背后的直觉是，正常形象的基本特征通常与非必需的样式转换一致，例如，稍微改变了人类肖像的面部表达。相反，对抗性例子通常对这种转变敏感。在我们检测对抗实例的方法中，我们提出了一个基于\ useverline {s} tylegan2生成器\ useverline {a} a} dversarial训练（vasa），以揭示触发图像，以揭示触发图像，从然后，我们根据潜在代码和样式转换之间的对应关系来构建一组具有非必需风格转换的编辑副本，并通过执行潜在的转换和重建。这些编辑副本的基于分类的一致性用于区分对抗实例。

Machine learning models have demonstrated vulnerability to adversarial attacks, more specifically misclassification of adversarial examples. In this paper, we investigate an attack-agnostic defense against adversarial attacks on high-resolution images by detecting suspicious inputs. The intuition behind our approach is that the essential characteristics of a normal image are generally consistent with non-essential style transformations, e.g., slightly changing the facial expression of human portraits. In contrast, adversarial examples are generally sensitive to such transformations. In our approach to detect adversarial instances, we propose an in\underline{V}ertible \underline{A}utoencoder based on the \underline{S}tyleGAN2 generator via \underline{A}dversarial training (VASA) to inverse images to disentangled latent codes that reveal hierarchical styles. We then build a set of edited copies with non-essential style transformations by performing latent shifting and reconstruction, based on the correspondences between latent codes and style transformations. The classification-based consistency of these edited copies is used to distinguish adversarial instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题