上下文意识到生成自然语言攻击的方法

论文标题

上下文意识到生成自然语言攻击的方法

A Context Aware Approach for Generating Natural Language Attacks

论文作者

Maheshwary, Rishabh, Maheshwary, Saket, Pudi, Vikram

论文摘要

我们研究在黑匣子设置中攻击自然语言处理模型的重要任务。我们提出了一种攻击策略，该策略在语义上类似的对抗性示例就文本分类和累积任务进行了。我们提出的攻击通过考虑原始单词及其周围环境的信息来找到候选单词。它共同利用蒙版的语言建模和下一个句子预测来理解上下文。与先前文献中提出的攻击相比，我们能够产生高质量的对抗性示例，这些例子在成功率和单词扰动百分比方面做得更好。

We study an important task of attacking natural language processing models in a black box setting. We propose an attack strategy that crafts semantically similar adversarial examples on text classification and entailment tasks. Our proposed attack finds candidate words by considering the information of both the original word and its surrounding context. It jointly leverages masked language modelling and next sentence prediction for context understanding. In comparison to attacks proposed in prior literature, we are able to generate high quality adversarial examples that do significantly better both in terms of success rate and word perturbation percentage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题