论文标题
重新访问开放域问题回答管道
Revisiting the Open-Domain Question Answering Pipeline
论文作者
论文摘要
开放域问题回答(QA)是从大量文件中识别自然问题的答案的TASL。典型的开放域QA系统始于信息检索,以从语料库中选择一部分文档,然后由机器读取器处理以选择答案跨度。本文描述了Mindstone,Mindstone是一种开放域质量检查系统,该系统由新的多阶段管道组成,该管道采用了传统的基于BM25的信息检索器,基于RM3的神经相关性反馈,神经排名和机器阅读理解阶段。本文为Wikipedia/Squad数据集(EM = 58.1,F1 = 65.8)建立了一个新的基线,以解决问题的端到端性能,对先前的最新状态具有可观的收益(Yang等,2019b)。我们还展示了新管道如何实现低分辨率标签的使用,并且可以轻松调整以满足各种时序要求。
Open-domain question answering (QA) is the tasl of identifying answers to natural questions from a large corpus of documents. The typical open-domain QA system starts with information retrieval to select a subset of documents from the corpus, which are then processed by a machine reader to select the answer spans. This paper describes Mindstone, an open-domain QA system that consists of a new multi-stage pipeline that employs a traditional BM25-based information retriever, RM3-based neural relevance feedback, neural ranker, and a machine reading comprehension stage. This paper establishes a new baseline for end-to-end performance on question answering for Wikipedia/SQuAD dataset (EM=58.1, F1=65.8), with substantial gains over the previous state of the art (Yang et al., 2019b). We also show how the new pipeline enables the use of low-resolution labels, and can be easily tuned to meet various timing requirements.