论文标题

从上下文中产生多样的QA对,并具有信息最大化层次结构条件VAE

Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs

论文作者

Lee, Dong Bok, Lee, Seanie, Jeong, Woo Tae, Kim, Donghwan, Hwang, Sung Ju

论文摘要

问题回答(QA)最关键的挑战之一是标记数据的稀缺性,因为获得带有人类注释的目标文本域的问答(QA)成对是昂贵的。解决该问题的另一种方法是使用问题上下文或大量非结构化文本(例如Wikipedia)自动生成QA对。在这项工作中,我们提出了一个分层条件变分自动编码器(HCVAE),用于生成QA对作为上下文,同时最大化生成的QA对之间的相互信息以确保其一致性。我们通过仅使用生成的QA模型(BERT-BASE)的性能(基于QA的评估)评估QA模型(BERT-BASE)的性能(QA基于QA评估),或通过使用QA模型(BERT-BASE)的性能来验证几个基准数据集上的层次条件变化自动编码器(INFO-HCVAE)的信息最大化的信息,或者通过对培训(半固定的模型)进行培训,以训练,以训练,以训练,以培训训练,以培训。结果表明,我们的模型仅使用一小部分数据进行培训,在这两个任务上都获得了所有基线的令人印象深刻的性能。

One of the most crucial challenges in question answering (QA) is the scarcity of labeled data, since it is costly to obtain question-answer (QA) pairs for a target text domain with human annotation. An alternative approach to tackle the problem is to use automatically generated QA pairs from either the problem context or from large amount of unstructured texts (e.g. Wikipedia). In this work, we propose a hierarchical conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts, while maximizing the mutual information between generated QA pairs to ensure their consistency. We validate our Information Maximizing Hierarchical Conditional Variational AutoEncoder (Info-HCVAE) on several benchmark datasets by evaluating the performance of the QA model (BERT-base) using only the generated QA pairs (QA-based evaluation) or by using both the generated and human-labeled pairs (semi-supervised learning) for training, against state-of-the-art baseline models. The results show that our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源