HOTPOTQA的简单但强大的管道

论文标题

HOTPOTQA的简单但强大的管道

A Simple Yet Strong Pipeline for HotpotQA

论文作者

Groeneveld, Dirk, Khot, Tushar, Mausam, Sabharwal, Ashish

论文摘要

多跳问题的最先进模型通常会增强具有额外，直觉上有用的功能，例如命名实体识别，基于图形的推理和问题分解，例如BERT（例如Bert）。但是，他们在流行的多跳数据集上的强劲性能是否真的证明了这一增加的设计复杂性是合理的吗？我们的结果表明，答案可能是否定的，因为即使我们基于Bert的简单管道（名为Quark）也表现出色。具体而言，在HotPotQA上，Quark在问题答案和支持识别方面都胜过这些模型（并实现非常接近Roberta模型的性能）。我们的管道有三个步骤：1）使用BERT彼此独立地识别潜在的相关句子； 2）将选定的句子作为上下文喂入标准的BERT跨度预测模型以选择答案； 3）使用句子选择模型，现在带有所选答案来产生辅助句子。夸克的强劲表现重新浮出水面，在使用流行的基准测试以证明复杂技术的价值之前，仔细探索简单的模型设计的重要性。

State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However, does their strong performance on popular multi-hop datasets really justify this added design complexity? Our results suggest that the answer may be no, because even our simple pipeline based on BERT, named Quark, performs surprisingly well. Specifically, on HotpotQA, Quark outperforms these models on both question answering and support identification (and achieves performance very close to a RoBERTa model). Our pipeline has three steps: 1) use BERT to identify potentially relevant sentences independently of each other; 2) feed the set of selected sentences as context into a standard BERT span prediction model to choose an answer; and 3) use the sentence selection model, now with the chosen answer, to produce supporting sentences. The strong performance of Quark resurfaces the importance of carefully exploring simple model designs before using popular benchmarks to justify the value of complex techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题