强大的编码：打击对抗错别字的框架

论文标题

强大的编码：打击对抗错别字的框架

Robust Encodings: A Framework for Combating Adversarial Typos

论文作者

Jones, Erik, Jia, Robin, Raghunathan, Aditi, Liang, Percy

论文摘要

尽管在许多任务上表现出色，但NLP系统很容易被小型的投入扰动所欺骗。现有的防御这种扰动的程序要么是（i）本质上是启发式启发式的，而且容易受到更强烈的攻击，或者（ii）为最坏的攻击提供了保证的鲁棒性，但与伯特（Bert）等最先进的模式不相容。在这项工作中，我们介绍了强大的编码（Roben）：一个简单的框架，可以保证坚固性，而无需对模型体系结构进行妥协。 Roben的核心组件是一个编码功能，该功能将句子映射到较小的离散编码空间。使用这些编码作为瓶颈会议的系统可以通过标准培训保证鲁棒性，并且可以在多个任务中使用相同的编码。我们确定了两个Desiderata来构建强大的编码函数：句子的扰动应映射到一组少数编码（稳定性），使用编码的模型仍应表现良好（Fidelity）。我们实例化罗宾（Roben）防御了大型的对抗错别字。在胶水的六个任务中，我们对Roben与Bert配对的实例化的平均稳健精度为71.3％，而对家族中所有对抗性错别字的平均精度为71.3％，而以前使用错别字折磨器的工作仅在简单的贪婪攻击方面只能达到35.3％的精度。

Despite excellent performance on many tasks, NLP systems are easily fooled by small adversarial perturbations of inputs. Existing procedures to defend against such perturbations are either (i) heuristic in nature and susceptible to stronger attacks or (ii) provide guaranteed robustness to worst-case attacks, but are incompatible with state-of-the-art models like BERT. In this work, we introduce robust encodings (RobEn): a simple framework that confers guaranteed robustness, without making compromises on model architecture. The core component of RobEn is an encoding function, which maps sentences to a smaller, discrete space of encodings. Systems using these encodings as a bottleneck confer guaranteed robustness with standard training, and the same encodings can be used across multiple tasks. We identify two desiderata to construct robust encoding functions: perturbations of a sentence should map to a small set of encodings (stability), and models using encodings should still perform well (fidelity). We instantiate RobEn to defend against a large family of adversarial typos. Across six tasks from GLUE, our instantiation of RobEn paired with BERT achieves an average robust accuracy of 71.3% against all adversarial typos in the family considered, while previous work using a typo-corrector achieves only 35.3% accuracy against a simple greedy attack.

下载PDF全文

下载文献需遵守相关版权规定

论文标题