论文标题
学习的空间索引的情况
The Case for Learned Spatial Indexes
论文作者
论文摘要
空间数据无处不在。每天从数十亿个支持GPS的设备(例如手机,汽车,传感器)和各种基于消费者的应用程序(例如Uber,Tinder,Facebook,Twitter,Instagram等位置标记的位置帖子等)生成大量数据。空间数据的指数增长使研究社区的这种指数增长使研究社区重点介绍了建筑系统和应用程序,并且可以处理Spatial数据高效。同时,最近的研究引入了学习的指数结构。在这项工作中,我们使用最先进的多维指数结构(即洪水)提出的技术,并将其应用于五个经典的多维指数,以便能够回答空间范围的查询。通过调整每种分区技术以达到最佳性能,我们表明(i)分区内学习的搜索要比在一个维度上使用过滤时比二进制搜索更快地提高11.79 \%至39.51%\%\%\%,(ii)树结构的瓶颈是索引查找的,可以通过将ISEREARITION和ISERITION端部进行汇总(III III)(III II II III(III II III)进行改进) 1.23倍至1.83倍的速度比在两个维度上过滤的最近竞争者快的速度快,并且(iv)学习的索引对低选择性查询的性能产生重大影响,而在较高的选择性下的效率较低。
Spatial data is ubiquitous. Massive amounts of data are generated every day from billions of GPS-enabled devices such as cell phones, cars, sensors, and various consumer-based applications such as Uber, Tinder, location-tagged posts in Facebook, Twitter, Instagram, etc. This exponential growth in spatial data has led the research community to focus on building systems and applications that can process spatial data efficiently. In the meantime, recent research has introduced learned index structures. In this work, we use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) and apply them to five classical multi-dimensional indexes to be able to answer spatial range queries. By tuning each partitioning technique for optimal performance, we show that (i) machine learned search within a partition is faster by 11.79\% to 39.51\% than binary search when using filtering on one dimension, (ii) the bottleneck for tree structures is index lookup, which could potentially be improved by linearizing the indexed partitions (iii) filtering on one dimension and refining using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions, and (iv) learned indexes can have a significant impact on the performance of low selectivity queries while being less effective under higher selectivities.