论文标题
自然对文本数据的后门攻击
Natural Backdoor Attack on Text Data
论文作者
论文摘要
最近,高级NLP模型看到了各种应用的使用激增。这增加了发布模型的安全威胁。除了干净的模型的无意弱点外,{\ em i。但是,当前大多数现有的作品都集中在对NLP模型的对抗性攻击上,而不是定位攻击,也称为\ textit {后门攻击}。在本文中,我们首先在NLP模型上提出\ textIt {自然后门攻击}。此外,我们利用各种攻击策略来生成文本数据的触发因素,并根据修改范围,人类识别和特殊情况调查不同类型的触发器。最后,我们评估了后门攻击,结果表明,通过100 \%的后门攻击成功率,在文本分类任务中牺牲了0.83 \%。
Recently, advanced NLP models have seen a surge in the usage of various applications. This raises the security threats of the released models. In addition to the clean models' unintentional weaknesses, {\em i.e.,} adversarial attacks, the poisoned models with malicious intentions are much more dangerous in real life. However, most existing works currently focus on the adversarial attacks on NLP models instead of positioning attacks, also named \textit{backdoor attacks}. In this paper, we first propose the \textit{natural backdoor attacks} on NLP models. Moreover, we exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases. Last, we evaluate the backdoor attacks, and the results show the excellent performance of with 100\% backdoor attacks success rate and sacrificing of 0.83\% on the text classification task.