通过对抗重量掩蔽擦除一弹性神经后门

论文标题

通过对抗重量掩蔽擦除一弹性神经后门

One-shot Neural Backdoor Erasing via Adversarial Weight Masking

论文作者

Chai, Shuwen, Chen, Jinghui

论文摘要

最近的研究表明，尽管在许多现实世界应用上达到了很高的精度，但深层神经网络（DNN）可以后do录制：通过将触发的数据样本注入训练数据集中，对手可以将受过训练的模型误导到将任何测试数据分类到目标类别中，因为触发触发器模式就会出现。为了消除此类后门威胁，已经提出了各种方法。特别是，一系列研究旨在净化潜在的损害模型。但是，这项工作的一个主要局限性是访问足够的原始培训数据的要求：当可用的培训数据受到限制时，净化性能要差得多。在这项工作中，我们提出了对抗性重量掩蔽（AWM），这是一种能够在单一设置中擦除神经后门的新方法。我们方法背后的关键思想是将其提出为最小最大优化问题：首先，对抗恢复触发模式，然后（软）掩盖对恢复模式敏感的网络权重。对几个基准数据集的全面评估表明，AWM在很大程度上可以改善对各种可用培训数据集大小的其他最先进方法的纯化效果。

Recent studies show that despite achieving high accuracy on a number of real-world applications, deep neural networks (DNNs) can be backdoored: by injecting triggered data samples into the training dataset, the adversary can mislead the trained model into classifying any test data to the target class as long as the trigger pattern is presented. To nullify such backdoor threats, various methods have been proposed. Particularly, a line of research aims to purify the potentially compromised model. However, one major limitation of this line of work is the requirement to access sufficient original training data: the purifying performance is a lot worse when the available training data is limited. In this work, we propose Adversarial Weight Masking (AWM), a novel method capable of erasing the neural backdoors even in the one-shot setting. The key idea behind our method is to formulate this into a min-max optimization problem: first, adversarially recover the trigger patterns and then (soft) mask the network weights that are sensitive to the recovered patterns. Comprehensive evaluations of several benchmark datasets suggest that AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题