利用语义和词汇匹配来改善文档检索系统的召回：一种混合方法

论文标题

利用语义和词汇匹配来改善文档检索系统的召回：一种混合方法

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

论文作者

Kuzi, Saar, Zhang, Mingyang, Li, Cheng, Bendersky, Michael, Najork, Marc

论文摘要

搜索引擎通常遵循两阶段的范式，在第一阶段（检索阶段），检索了初始文档，在第二阶段（重新排列阶段）中，文档被重新排列以获得最终结果列表。虽然深度神经网络被证明可以改善以前作品中重新排列阶段的性能，但关于使用深层神经网络来改善检索阶段的文献很少。在本文中，我们研究了将深度神经网络模型和词汇模型结合起来的优点。提出了一种利用语义（基于深度神经网络）和词汇（基于关键字匹配的）检索模型的混合方法。我们使用公开可用的TREC集合进行了一项实证研究，该研究证明了我们方法的有效性，并阐明了语义方法的不同特征，词汇方法及其组合。

Search engines often follow a two-phase paradigm where in the first stage (the retrieval stage) an initial set of documents is retrieved and in the second stage (the re-ranking stage) the documents are re-ranked to obtain the final result list. While deep neural networks were shown to improve the performance of the re-ranking stage in previous works, there is little literature about using deep neural networks to improve the retrieval stage. In this paper, we study the merits of combining deep neural network models and lexical models for the retrieval stage. A hybrid approach, which leverages both semantic (deep neural network-based) and lexical (keyword matching-based) retrieval models, is proposed. We perform an empirical study, using a publicly available TREC collection, which demonstrates the effectiveness of our approach and sheds light on the different characteristics of the semantic approach, the lexical approach, and their combination.

下载PDF全文

下载文献需遵守相关版权规定

论文标题