论文标题

使用极端订单统计的伴随物

Sublinear Maximum Inner Product Search using Concomitants of Extreme Order Statistics

论文作者

Pham, Ninh

论文摘要

我们根据极端阶统计的伴随理论提出了一种最大内部产品搜索(MIPS)的新颖维度降低方法,名为CEO。利用这些伴随物的渐近行为,我们表明,与查询特征的极端值相关的一些维度足以估计内部产物。由于CEO仅使用查询签名的一小部分进行估计的迹象,因此我们可以在查询之前准确地对所有内部产品估计器进行预先计算。这些属性产生具有指数索引空间复杂性的均值MIPS算法。我们表明,我们的指数空间对于单位球体上的$(1 +ε)$ - 近似MIPS是最佳的。在温和的情况下,可以在理论上保证CEO的搜索召回。 为了处理指数空间的复杂性,我们提出了两个实用变体,包括SCEOS-TA和COCEO,它们使用线性空间来求解MIP。 SCEOS-TA利用阈值算法(TA),并为竞争性MIPS求解器提供了出色的搜索召回。 COCEO是一种数据和维度共同还原技术,并且在高召回要求上胜过SCEOS-TA。从经验上讲,与Bruteforce搜索相比,它们非常易于实现,并且至少达到100倍的速度,同时在许多大规模数据集中返回至少90%的前十名MIPS。

We propose a novel dimensionality reduction method for maximum inner product search (MIPS), named CEOs, based on the theory of concomitants of extreme order statistics. Utilizing the asymptotic behavior of these concomitants, we show that a few dimensions associated with the extreme values of the query signature are enough to estimate inner products. Since CEOs only uses the sign of a small subset of the query signature for estimation, we can precompute all inner product estimators accurately before querying. These properties yield a sublinear MIPS algorithm with an exponential indexing space complexity. We show that our exponential space is optimal for the $(1 + ε)$-approximate MIPS on a unit sphere. The search recall of CEOs can be theoretically guaranteed under a mild condition. To deal with the exponential space complexity, we propose two practical variants, including sCEOs-TA and coCEOs, that use linear space for solving MIPS. sCEOs-TA exploits the threshold algorithm (TA) and provides superior search recalls to competitive MIPS solvers. coCEOs is a data and dimension co-reduction technique and outperforms sCEOs-TA on high recall requirements. Empirically, they are very simple to implement and achieve at least 100x speedup compared to the bruteforce search while returning top-10 MIPS with accuracy at least 90% on many large-scale data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源