论文标题
超越词汇:文本searchengine的语义检索框架
Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine
论文作者
论文摘要
搜索引擎已成为各种Web和移动应用程序中的基本组件。从大规模数据集中检索相关文档对于搜索引擎系统来说是一项挑战,尤其是在面对冗长或尾部查询时。在本文中,我们探索了一个矢量空间搜索框架,以进行文档检索。具体来说,我们训练了一个深层的语义匹配模型,以便可以将每个查询和文档编码为低维嵌入。我们的模型是根据Bert体系结构培训的。我们为在线服务部署了快速的K-Neart-Neighbor索引服务。离线和在线指标都表明我们的方法改善了检索性能和搜索质量,尤其是对于尾巴
Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and document can be encoded as a low dimensional embedding. Our model was trained based on BERT architecture. We deployed a fast k-nearest-neighbor index service for online serving. Both offline and online metrics demonstrate that our method improved retrieval performance and search quality considerably, particularly for tail