论文标题
通过词汇简化增强预训练的语言模型
Enhancing Pre-trained Language Model with Lexical Simplification
论文作者
论文摘要
对于人类的读者和预训练的语言模型(PRLM),词汇多样性可能会导致混乱和不准确,因为理解给定句子的潜在语义含义。通过将复杂的单词替换为简单的替代方案,词汇简化(LS)是一种识别的方法,可以减少这种词汇多样性,从而提高句子的可理解性。在本文中,我们利用LS并提出了一种新颖的方法,可以有效地改善PRLM在文本分类中的性能。基于规则的简化过程应用于给定的句子。鼓励PRLMs通过简化版本的辅助输入来预测给定句子的真实标签。使用强大的PRLM(Bert和Electra)作为基线,我们的方法仍然可以进一步提高各种文本分类任务的性能。
For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this paper, we leverage LS and propose a novel approach which can effectively improve the performance of PrLMs in text classification. A rule-based simplification process is applied to a given sentence. PrLMs are encouraged to predict the real label of the given sentence with auxiliary inputs from the simplified version. Using strong PrLMs (BERT and ELECTRA) as baselines, our approach can still further improve the performance in various text classification tasks.