通域的密集通过检索开放域问题回答

论文标题

通域的密集通过检索开放域问题回答

Dense Passage Retrieval for Open-Domain Question Answering

论文作者

Karpukhin, Vladimir, Oğuz, Barlas, Min, Sewon, Lewis, Patrick, Wu, Ledell, Edunov, Sergey, Chen, Danqi, Yih, Wen-tau

论文摘要

开放域问题回答依赖于有效的通过检索来选择候选环境，在这种情况下，传统的稀疏矢量空间模型（例如TF-IDF或BM25）是事实上的方法。在这项工作中，我们证明可以单独使用密集表示可以实际实现检索，在这种情况下，通过简单的双重编码框架从少量的问题和段落中学习了嵌入。当对广泛的开放域QA数据集进行评估时，我们致密的回收者的表现优于强大的Lucene-BM25系统，就TOP-20段落的检索准确性而言，绝对是9％-19％的绝对值，并帮助我们的端到端QA系统建立在多个开放式QA QA QA基础上建立新的州的最新态度。

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题