训练更加自信：训练期间减轻注入和自然的后门

论文标题

训练更加自信：训练期间减轻注入和自然的后门

Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

论文作者

Wang, Zhenting, Ding, Hailun, Zhai, Juan, Ma, Shiqing

论文摘要

后门或特洛伊木马攻击是对深神经网络（DNN）的严重威胁。研究人员发现，经过良性数据和设置培训的DNN也可以学习后门行为，这被称为天然后门。现有关于反背门学习的作品是基于薄弱的观察结果，即后门和良性行为在训练过程中可以区分。缓慢中毒的自适应攻击可以绕过此类防御。此外，这些方法无法捍卫自然的后门。我们发现与后门相关的神经元与良性神经元之间的基本差异：与后门相关的神经元形成了一个超平面，作为分类表面跨所有受影响标签的输入域的分类表面。通过进一步分析训练过程和模型体系结构，我们发现零件线性函数会导致这种超平面表面。在本文中，我们设计了一种新颖的训练方法，该方法迫使训练避免产生此类超平面，从而删除注入的后门。我们对五个最先进攻击的五个数据集进行了广泛的实验，并且良性培训表明，我们的方法可以超越现有的最新防御能力。平均而言，在Notal Disoing Backoor攻击下，接受过无与伦比的模型训练的模型的ASR（攻击成功率）比未防御的模型低54.83倍，在自然后门攻击下低1.75倍。我们的代码可在https://github.com/ru-system-software-and-security/none上找到。

The backdoor or Trojan attack is a severe threat to deep neural networks (DNNs). Researchers find that DNNs trained on benign data and settings can also learn backdoor behaviors, which is known as the natural backdoor. Existing works on anti-backdoor learning are based on weak observations that the backdoor and benign behaviors can differentiate during training. An adaptive attack with slow poisoning can bypass such defenses. Moreover, these methods cannot defend natural backdoors. We found the fundamental differences between backdoor-related neurons and benign neurons: backdoor-related neurons form a hyperplane as the classification surface across input domains of all affected labels. By further analyzing the training process and model architectures, we found that piece-wise linear functions cause this hyperplane surface. In this paper, we design a novel training method that forces the training to avoid generating such hyperplanes and thus remove the injected backdoors. Our extensive experiments on five datasets against five state-of-the-art attacks and also benign training show that our method can outperform existing state-of-the-art defenses. On average, the ASR (attack success rate) of the models trained with NONE is 54.83 times lower than undefended models under standard poisoning backdoor attack and 1.75 times lower under the natural backdoor attack. Our code is available at https://github.com/RU-System-Software-and-Security/NONE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题