论文标题

基于树的搜索图,以大约最近的邻居搜索

Tree-based Search Graph for Approximate Nearest Neighbor Search

论文作者

Fan, Xiaobin, Wang, Xiaoping, Lu, Kai, Xue, Lei, Zhao, Jinjing

论文摘要

最近的邻居搜索支持许多域中的重要应用程序,例如数据库,机器学习,计算机视觉。由于准确搜索的计算成本太高,因此社区转向了大约最近的邻居搜索(ANN)的研究。其中,基于图的算法是最重要的分支之一。 Fu等人的研究。表明,基于单调搜索网络(MSNET)(例如NSG和NSSG)的算法已在效率上实现了最新的搜索性能。 MSNET致力于以最小的节点超过节点来实现单调搜索,以提高高效率。但是,当前的MSNET设计并未优化单调搜索的概率,概率的下限仅为50%。如果他们在单调搜索阶段失败,则必须遭受巨大的回溯成本才能达到所需的准确性。这将导致搜索效率的性能问题。为了解决这个问题,我们提出(R,P)-MSNET,可以确保单调搜索的概率。由于严格(R,P)-MSNET的建筑复杂性很高,我们提出了TBSG,这是一个近似值较低的近似值。在四百万个尺度数据集上进行的实验表明,TBSG在搜索效率方面的表现优于现有的基于图形的算法。我们的代码已在Github上发布。

Nearest neighbor search supports important applications in many domains, such as database, machine learning, computer vision. Since the computational cost for accurate search is too high, the community turned to the research of approximate nearest neighbor search (ANNS). Among them, graph-based algorithm is one of the most important branches. Research by Fu et al. shows that the algorithms based on Monotonic Search Network (MSNET), such as NSG and NSSG, have achieved the state-of-the-art search performance in efficiency. The MSNET is dedicated to achieving monotonic search with minimal out-degree of nodes to pursue high efficiency. However, the current MSNET designs did not optimize the probability of the monotonic search, and the lower bound of the probability is only 50%. If they fail in monotonic search stage, they have to suffer tremendous backtracking cost to achieve the required accuracy. This will cause performance problems in search efficiency. To address this problem, we propose (r,p)-MSNET, which achieves guaranteed probability on monotonic search. Due to the high building complexity of a strict (r,p)-MSNET, we propose TBSG, which is an approximation with low complexity. Experiment conducted on four million-scaled datasets show that TBSG outperforms existing state-of-the-art graph-based algorithms in search efficiency. Our code has been released on Github.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源