从多跨度撰写以阅读理解的答案

论文标题

从多跨度撰写以阅读理解的答案

Composing Answer from Multi-spans for Reading Comprehension

论文作者

Zhang, Zhuosheng, Zhang, Yiqing, Zhao, Hai, Zhou, Xi, Zhou, Xiang

论文摘要

本文提出了一种新的方法，可以为非摘要机读取理解（MRC）任务生成答案，其答案不能简单地从给定段落中提取为一个跨度。当人类注释者给出地面真相的答案或从一部分的某些部分给出了高度重新分配时，使用指针网络式的提取解码器进行这种类型的MRC可能会导致性能不令人满意。另一方面，在遇到长句子时，使用生成解码器不能很好地保证用良好的语法和语义的答案。因此，为了减轻双方的明显弊端，我们提出了一种从提取的多跨度中提取的答案方法，我们的模型在给定段落中被认为是高度自信的$ n $ gram候选人。也就是说，返回的答案由不连续的多跨度组成，而不仅仅是在给定段落中连续一个跨度。所提出的方法很简单，但有效：在MARCO女士上进行的经验实验表明，该方法在准确产生长答案方面具有更好的性能，并且显然优于两个竞争性典型的一跨度和SEQ2SEQ基线解码器。

This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such type of MRC may result in unsatisfactory performance when the ground-truth answers are given by human annotators or highly re-paraphrased from parts of the passages. On the other hand, using generative decoder cannot well guarantee the resulted answers with well-formed syntax and semantics when encountering long sentences. Therefore, to alleviate the obvious drawbacks of both sides, we propose an answer making-up method from extracted multi-spans that are learned by our model as highly confident $n$-gram candidates in the given passage. That is, the returned answers are composed of discontinuous multi-spans but not just one consecutive span in the given passages anymore. The proposed method is simple but effective: empirical experiments on MS MARCO show that the proposed method has a better performance on accurately generating long answers, and substantially outperforms two competitive typical one-span and Seq2Seq baseline decoders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题