论文标题

“ BEATRIX”复活:通过革兰氏矩阵的强大后门检测

The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram Matrices

论文作者

Ma, Wanlun, Wang, Derui, Sun, Ruoxi, Xue, Minhui, Wen, Sheng, Xiang, Yang

论文摘要

深度神经网络(DNN)在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常工作,但是当输入中的某些模式触发时,会产生一个预定义的目标标签。现有防御通常依赖于通用后门设置的假设,其中有毒样品共享相同的均匀扳机。但是,最近的高级后门攻击表明,这种假设在动态后门中不再有效,在动态后门中,触发者因输入而异,从而击败了现有的防御。 在这项工作中,我们提出了一种新颖的技术BEATRIX(通过革兰氏矩阵检测到后门检测)。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性,还可以捕获表示形式的适当高阶信息。通过从普通样本的激活模式中学习类条件统计,BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能,BEATRIX利用基于内核的测试而无需对表示分布做出任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明,我们的方法在检测动态后门时达到了91.1%的F1得分,而最新技术只能达到36.9%。

Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training. The model corrupted in this way functions normally, but when triggered by certain patterns in the input, produces a predefined target label. Existing defenses usually rely on the assumption of the universal backdoor setting in which poisoned samples share the same uniform trigger. However, recent advanced backdoor attacks show that this assumption is no longer valid in dynamic backdoors where the triggers vary from input to input, thereby defeating the existing defenses. In this work, we propose a novel technique, Beatrix (backdoor detection via Gram matrix). Beatrix utilizes Gram matrix to capture not only the feature correlations but also the appropriately high-order information of the representations. By learning class-conditional statistics from activation patterns of normal samples, Beatrix can identify poisoned samples by capturing the anomalies in activation patterns. To further improve the performance in identifying target labels, Beatrix leverages kernel-based testing without making any prior assumptions on representation distribution. We demonstrate the effectiveness of our method through extensive evaluation and comparison with state-of-the-art defensive techniques. The experimental results show that our approach achieves an F1 score of 91.1% in detecting dynamic backdoors, while the state of the art can only reach 36.9%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源