论文标题
关于语言编码的鲁棒性语法错误
On the Robustness of Language Encoders against Grammatical Errors
论文作者
论文摘要
当面对自然的语法错误时,我们进行了一项彻底的研究,以诊断预训练的语言编码器(Elmo,Bert和Roberta)的行为。具体而言,我们从非本地说话者那里收集真正的语法错误,并进行对抗攻击,以模拟清洁文本数据上的这些错误。我们使用这种方法来促进下游应用程序上的调试模型。结果证实,所有测试模型的性能受到影响,但影响程度有所不同。为了解释模型行为,我们进一步设计了一项语言可接受性任务,以揭示其在识别不语法句子和错误位置方面的能力。我们发现,固定的上下文编码器具有对句子正确性预测的简单分类器进行训练的,能够找到错误位置。我们还为伯特(Bert)设计了一个固定测试,并发现伯特(Bert)在上下文中捕获了错误与特定令牌之间的相互作用。我们的结果揭示了理解语言编码语法错误的鲁棒性和行为。
We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.