通过图像残差检测贴片对抗攻击

论文标题

通过图像残差检测贴片对抗攻击

Detecting Patch Adversarial Attacks with Image Residuals

论文作者

Arvinte, Marius, Tewfik, Ahmed, Vishwanath, Sriram

论文摘要

我们基于图像残差引入了一种对抗性样本检测算法，该算法专门为防止基于贴片的攻击而设计。图像残留是作为输入图像和IT的DeNoized版本之间的差异获得的，并且对歧视器进行了训练以区分干净和对抗样本。更确切地说，我们使用小波域算法来降低图像，并证明获得的残留物是对抗攻击的数字指纹。为了模仿物理对手的局限性，我们评估了方法对局部（基于补丁的）对抗攻击的性能，包括在对手对检测方案完全了解的设置中。我们的结果表明，所提出的检测方法概括了以前看不见，更强的攻击，并且能够降低自适应攻击者的成功率（相反，增加计算工作）。

We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images and demonstrate that the obtained residuals act as a digital fingerprint for adversarial attacks. To emulate the limitations of a physical adversary, we evaluate the performance of our approach against localized (patch-based) adversarial attacks, including in settings where the adversary has complete knowledge about the detection scheme. Our results show that the proposed detection method generalizes to previously unseen, stronger attacks and that it is able to reduce the success rate (conversely, increase the computational effort) of an adaptive attacker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题