与文本分类的模型鲁棒性：具有语义的对抗性攻击

论文标题

与文本分类的模型鲁棒性：具有语义的对抗性攻击

Model Robustness with Text Classification: Semantic-preserving adversarial attacks

论文作者

Singh, Rahul, Joshi, Tarun, Nair, Vijayan N., Sudjianto, Agus

论文摘要

我们建议算法创建对抗性攻击，以评估文本分类问题中的模型鲁棒性。它们可用于创建白框攻击和黑匣子攻击，同时保留原始文本的语义和语法。攻击在白框设置中引起大量的翻转，并且基于相同的规则可以在黑框设置中使用。在黑框设置中，创建的攻击能够逆转基于变压器的体系结构的决策。

We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box setting, the attacks created are able to reverse decisions of transformer based architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题