论文标题
Speed-Ann:低延迟和高准确性最近通过Query Paralleleism搜索
Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism
论文作者
论文摘要
最近的邻居搜索(NNS)最近由于其在数据科学和AI应用中管理高维矢量数据方面的核心作用而引起了兴趣的迅速增加。神经嵌入的成功助长了兴趣,其中深度学习模型将非结构化的数据转化为语义相关的功能向量以进行数据分析,例如,推荐流行项目。在快速NNS的几类方法中,相似性图是最成功的算法趋势之一。核心的几个最受欢迎和表现最佳的相似性图,例如NSG和HNSW,在其核心沿底层图指数中采用了最佳的遍历,可以在邻居附近进行搜索。最大化搜索的性能对于许多任务至关重要,尤其是在大规模和高核心制度下。在这项工作中,我们对最先进的相似性搜索算法的挑战进行了深入的检查,揭示了其在利用多核处理器以加快搜索效率的挑战。我们还利用了相似性图搜索是否可以通过允许多个步行者同时推进搜索边界来偏离严格顺序。基于我们的见解,我们提出了Speed-Ann,这是一种并行相似性搜索算法,该算法利用隐藏的质疑并行性和内存层次结构,允许相似性搜索利用多个CPU核心,可以显着加速搜索速度,同时实现高精度。 我们在广泛的数据集上评估了Speed-Ann,从百万到十亿个数据点不等,并显示其比NSG和HNSW较短的查询延迟。此外,借助多核心支持,我们表明我们的方法比高度优化的GPU实现提供了更快的搜索延迟,并且随着硬件资源数量(例如CPU内核)和图形大小的增加提供了良好的可扩展性。
Nearest Neighbor Search (NNS) has recently drawn a rapid increase of interest due to its core role in managing high-dimensional vector data in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommend popular items. Among several categories of methods for fast NNS, similarity graph is one of the most successful algorithmic trends. Several of the most popular and top-performing similarity graphs, such as NSG and HNSW, at their core employ best-first traversal along the underlying graph indices to search near neighbors. Maximizing the performance of the search is essential for many tasks, especially at the large-scale and high-recall regime. In this work, we provide an in-depth examination of the challenges of the state-of-the-art similarity search algorithms, revealing its challenges in leveraging multi-core processors to speed up the search efficiency. We also exploit whether similarity graph search is robust to deviation from maintaining strict order by allowing multiple walkers to simultaneously advance the search frontier. Based on our insights, we propose Speed-ANN, a parallel similarity search algorithm that exploits hidden intra-query parallelism and memory hierarchy that allows similarity search to take advantage of multiple CPU cores to significantly accelerate search speed while achieving high accuracy. We evaluate Speed-ANN on a wide range of datasets, ranging from million to billion data points, and show its shorter query latency than NSG and HNSW, respectively. Besides, with multicore support, we show that our approach offers faster search latency than highly-optimized GPU implementation and provides good scalability as the increase of the number of hardware resources (e.g., CPU cores) and graph sizes.