论文标题

通域的密集通过检索开放域问题回答

Dense Passage Retrieval for Open-Domain Question Answering

论文作者

Karpukhin, Vladimir, Oğuz, Barlas, Min, Sewon, Lewis, Patrick, Wu, Ledell, Edunov, Sergey, Chen, Danqi, Yih, Wen-tau

论文摘要

开放域问题回答依赖于有效的通过检索来选择候选环境,在这种情况下,传统的稀疏矢量空间模型(例如TF-IDF或BM25)是事实上的方法。在这项工作中,我们证明可以单独使用密集表示可以实际实现检索,在这种情况下,通过简单的双重编码框架从少量的问题和段落中学习了嵌入。当对广泛的开放域QA数据集进行评估时,我们致密的回收者的表现优于强大的Lucene-BM25系统,就TOP-20段落的检索准确性而言,绝对是9%-19%的绝对值,并帮助我们的端到端QA系统建立在多个开放式QA QA QA基础上建立新的州的最新态度。

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源