PatchGuard：通过小型接收场和掩蔽，可证明对对抗斑块的强大防御

论文标题

PatchGuard：通过小型接收场和掩蔽，可证明对对抗斑块的强大防御

PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

论文作者

Xiang, Chong, Bhagoji, Arjun Nitin, Sehwag, Vikash, Mittal, Prateek

论文摘要

局部对抗斑块旨在通过在图像的受限区域内任意修改像素来诱导机器学习模型中的错误分类。可以通过将对抗性贴剂连接到要误以为分类的对象，而防御此类攻击是未解决/开放的问题，可以在物理世界中实现此类攻击。在本文中，我们提出了一个称为PatchGuard的通用防御框架，该框架可以实现高可证明的鲁棒性，同时保持对局部对抗斑块的高清洁精度。 PatchGuard的基石涉及使用带有小型接收场的CNN来对受对抗贴片损坏的功能限制。给定数量有限的损坏功能，设计对抗贴片防御的问题减少了设计安全的特征聚合机制。为此，我们介绍了强大的掩蔽防御，该防御强大地检测并掩盖了损坏的功能以恢复正确的预测。值得注意的是，我们可以证明防御对威胁模型中任何对手的鲁棒性。我们对Imagenet，Imagenette（Imagenett的10级子集）和CIFAR-10数据集的广泛评估表明，我们的防御能够以可证明的可证明的鲁棒精度和清洁精度来实现最先进的性能。

Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches. The cornerstone of PatchGuard involves the use of CNNs with small receptive fields to impose a bound on the number of features corrupted by an adversarial patch. Given a bounded number of corrupted features, the problem of designing an adversarial patch defense reduces to that of designing a secure feature aggregation mechanism. Towards this end, we present our robust masking defense that robustly detects and masks corrupted features to recover the correct prediction. Notably, we can prove the robustness of our defense against any adversary within our threat model. Our extensive evaluation on ImageNet, ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates that our defense achieves state-of-the-art performance in terms of both provable robust accuracy and clean accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题