论文标题

Qagan:学习域不变语言特征的对抗性方法

QAGAN: Adversarial Approach To Learning Domain Invariant Language Features

论文作者

Shrivastava, Shubham, Wang, Kaiyue

论文摘要

对数据域转移的强大培训模型在学术界和行业中都引起了人们的兴趣。提问的语言模型是自然语言处理(NLP)研究中典型的问题之一,随着大型变压器模型的出现,取得了很大的成功。但是,现有方法主要是在训练和测试过程中从相同的分布中获取数据的,这在野外是不切实际且不可缩放的。 在本文中,我们探讨了学习域不变特征的对抗性培训方法,以便语言模型可以很好地推广到室外数据集。我们还检查了其他各种方法,以提高模型性能,包括通过释义句子来扩大数据的增强,答案的条件末端对开始单词的预测以及精心设计的退火功能。我们的初步结果表明,与这些方法相结合,我们能够在基线上获得$ 15.2 \%$改善的EM分数和5.6 \%$ $提高F1分数的F1分数。我们还通过将模型输出投影到较低维空间中来剖析模型输出,并可视化模型的隐藏状态,并发现我们的特定对抗性训练方法确实鼓励模型学习域不变性嵌入并使它们在多维空间中更加接近。

Training models that are robust to data domain shift has gained an increasing interest both in academia and industry. Question-Answering language models, being one of the typical problem in Natural Language Processing (NLP) research, has received much success with the advent of large transformer models. However, existing approaches mostly work under the assumption that data is drawn from same distribution during training and testing which is unrealistic and non-scalable in the wild. In this paper, we explore adversarial training approach towards learning domain-invariant features so that language models can generalize well to out-of-domain datasets. We also inspect various other ways to boost our model performance including data augmentation by paraphrasing sentences, conditioning end of answer span prediction on the start word, and carefully designed annealing function. Our initial results show that in combination with these methods, we are able to achieve $15.2\%$ improvement in EM score and $5.6\%$ boost in F1 score on out-of-domain validation dataset over the baseline. We also dissect our model outputs and visualize the model hidden-states by projecting them onto a lower-dimensional space, and discover that our specific adversarial training approach indeed encourages the model to learn domain invariant embedding and bring them closer in the multi-dimensional space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源