源代码神经模型中的后门

论文标题

源代码神经模型中的后门

Backdoors in Neural Models of Source Code

论文作者

Ramakrishnan, Goutham, Albarghouthi, Aws

论文摘要

深度神经网络容易受到一系列对手的影响。一类特别有害的漏洞是后门，在输入中存在微妙的触发器的情况下，模型预测在情况下存在差异。攻击者可以通过中毒训练数据来植入后门，以产生触发输入的所需目标预测。我们在源代码深入学习的背景下研究后门。（1）我们为源代码任务定义了一系列的后门类，并显示如何毒化数据集以安装此类后门。（2）我们从可靠的统计数据中适应并改善了最新的算法，以表明后门在源代码的学习代表中留下了光谱签名，从而可以检测中毒数据。（3）我们对不同的体系结构和语言进行了彻底的评估，表明了注射后门的便利性和消除它们的能力。

Deep neural networks are vulnerable to a range of adversaries. A particularly pernicious class of vulnerabilities are backdoors, where model predictions diverge in the presence of subtle triggers in inputs. An attacker can implant a backdoor by poisoning the training data to yield a desired target prediction on triggered inputs. We study backdoors in the context of deep-learning for source code. (1) We define a range of backdoor classes for source-code tasks and show how to poison a dataset to install such backdoors. (2) We adapt and improve recent algorithms from robust statistics for our setting, showing that backdoors leave a spectral signature in the learned representation of source code, thus enabling detection of poisoned data. (3) We conduct a thorough evaluation on different architectures and languages, showing the ease of injecting backdoors and our ability to eliminate them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题