论文标题
通过随机替换编码来防御单词级的对抗攻击
Defense of Word-level Adversarial Attacks via Random Substitution Encoding
论文作者
论文摘要
对计算机视觉任务的深层神经网络的对抗性攻击产生了许多新技术,这些技术有助于保护模型免于避免错误的预测。最近,对自然语言处理(NLP)任务的深层模型(NLP)任务的文字级对抗性攻击也表明了强大的力量,例如,欺骗了情感分类神经网络以做出错误的决定。不幸的是,以前很少有文献讨论过这种单词级同义词替代攻击的辩护,因为很难被感知和检测到它们。在本文中,我们阐明了这个问题,并提出了一个名为“随机替代编码”(RSE)的新型防御框架,该框架将随机替换编码引入原始神经网络的训练过程中。在文本分类任务上进行的广泛实验证明了我们在各种基础和攻击模型下对文字级别对抗攻击的防御的有效性。
The adversarial attacks against deep neural networks on computer vision tasks have spawned many new technologies that help protect models from avoiding false predictions. Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decisions. Unfortunately, few previous literatures have discussed the defense of such word-level synonym substitution based attacks since they are hard to be perceived and detected. In this paper, we shed light on this problem and propose a novel defense framework called Random Substitution Encoding (RSE), which introduces a random substitution encoder into the training process of original neural networks. Extensive experiments on text classification tasks demonstrate the effectiveness of our framework on defense of word-level adversarial attacks, under various base and attack models.