置信度重要：通过分配转移检查深神经网络中的后门

论文标题

置信度重要：通过分配转移检查深神经网络中的后门

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

论文作者

Wang, Tong, Yao, Yuan, Xu, Feng, Xu, Miao, An, Shengwei, Wang, Ting

论文摘要

后门攻击已被证明是对深度学习模型的严重安全威胁，并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要基于这样的观察，即后门触发器通常尺寸很小或仅影响几个神经元的激活。但是，在许多情况下，尤其是对于高级后门攻击，违反了上述观察结果，阻碍了现有防御的性能和适用性。在本文中，我们提出了基于新观察的后门防御范围。也就是说，有效的后门攻击通常需要对中毒训练样本的高预测信心，以确保训练有素的模型具有很高的可能性。基于此观察结果，DtinSpector首先学习了一个可以改变最高信心数据的预测的补丁，然后通过检查在低调数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击，四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。

Backdoor attacks have been shown to be a serious security threat against deep learning models, and detecting whether a given model has been backdoored becomes a crucial task. Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons. However, the above observations are violated in many cases especially for advanced backdoor attacks, hindering the performance and applicability of the existing defenses. In this paper, we propose a backdoor defense DTInspector built upon a new observation. That is, an effective backdoor attack usually requires high prediction confidence on the poisoned training samples, so as to ensure that the trained model exhibits the targeted behavior with a high probability. Based on this observation, DTInspector first learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor by checking the ratio of prediction changes after applying the learned patch on the low-confidence data. Extensive evaluations on five backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

下载PDF全文

下载文献需遵守相关版权规定

论文标题