论文标题
Nmslib和Flexneuart的灵活检索
Flexible retrieval with NMSLIB and FlexNeuART
论文作者
论文摘要
我们的目标是向NLP社区介绍现有的K-NN搜索库NMSLIB,NMSLIB是一种新的检索工具包Flexneuart及其集成功能。 NMSLIB虽然是最快的K-NN搜索库之一,但非常通用,并且支持各种距离/相似性功能。由于库依赖于基于距离的结构 - 不合理算法,因此可以通过添加新距离进一步扩展它。 Flexneuart是一个模块化,可扩展和灵活的工具包,用于IR和QA应用中的候选生成,可支持经典和神经排名信号的混合。 Flexneuart可以有效地检索混合密度和稀疏表示(从训练数据中学到的权重),这是通过扩展NMSLIB来实现的。在此,其他检索系统与纯稀疏表示(例如Lucene),纯粹密集的表示(例如,faiss和Fary)或仅在重新排列阶段进行混合。
Our objective is to introduce to the NLP community an existing k-NN search library NMSLIB, a new retrieval toolkit FlexNeuART, as well as their integration capabilities. NMSLIB, while being one the fastest k-NN search libraries, is quite generic and supports a variety of distance/similarity functions. Because the library relies on the distance-based structure-agnostic algorithms, it can be further extended by adding new distances. FlexNeuART is a modular, extendible and flexible toolkit for candidate generation in IR and QA applications, which supports mixing of classic and neural ranking signals. FlexNeuART can efficiently retrieve mixed dense and sparse representations (with weights learned from training data), which is achieved by extending NMSLIB. In that, other retrieval systems work with purely sparse representations (e.g., Lucene), purely dense representations (e.g., FAISS and Annoy), or only perform mixing at the re-ranking stage.