文本平滑：增强文本分类任务上的各种数据增强方法

论文标题

文本平滑：增强文本分类任务上的各种数据增强方法

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

论文作者

Wu, Xing, Gao, Chaochen, Lin, Meng, Zang, Liangjun, Wang, Zhongyuan, Hu, Songlin

论文摘要

在进入神经网络之前，令牌通常会转换为相应的一hot表示，这是词汇的离散分布。平滑表示是从预先训练的蒙版语言模型中获得的候选令牌的可能性，这可以看作是对单热代表的更有信息的替代。我们提出了一种有效的数据增强方法，称为文本平滑，通过将句子从单式表示形式转换为可控平滑表示形式。我们在低资源制度中评估了不同基准测试的文本平滑。实验结果表明，平滑文本的表现优于各种主流数据增强方法的大幅度。此外，文本平滑可以与这些数据增强方法结合使用，以实现更好的性能。

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题