反例暴露了2006年推出的覆盖树的时间复杂性证明的空白

论文标题

反例暴露了2006年推出的覆盖树的时间复杂性证明的空白

Counterexamples expose gaps in the proof of time complexity for cover trees introduced in 2006

论文作者

Elkin, Yury, Kurlin, Vitaliy

论文摘要

This paper is motivated by the k-nearest neighbors search: given an arbitrary metric space, and its finite subsets (a reference set R and a query set Q), design a fast algorithm to find all k-nearest neighbors in R for every point q in Q. In 2006, Beygelzimer, Kakade, and Langford introduced cover trees to justify a near-linear time complexity for the neighbor search in the sizes of Q,R. Curtin博士（2015）第5.3节指出，这一结果的证明是错误的。原始证明的关键步骤试图证明迭代次数可以通过将盖子中最长的根对叶路径的长度乘以恒定因子来估算。但是，此估计可能会错过封面树几个分支中的许多潜在节点，这应该在邻居搜索期间考虑。不幸的是，在随后的几篇论文中使用覆盖树从2006年起重复了同样的论点。本文明确构建了具有挑战性的数据集，这些数据集可为封面树构造的过去时间复杂性提供反例，在ICML 2006上提出的k-nearthigh邻居搜索以及在NIPS 2009中发布的双树搜索算法。通过使用新的压缩盖树简化了原始树结构，证明了校正后的具有额外参数的近线性时间复杂性。

This paper is motivated by the k-nearest neighbors search: given an arbitrary metric space, and its finite subsets (a reference set R and a query set Q), design a fast algorithm to find all k-nearest neighbors in R for every point q in Q. In 2006, Beygelzimer, Kakade, and Langford introduced cover trees to justify a near-linear time complexity for the neighbor search in the sizes of Q,R. Section 5.3 of Curtin's PhD (2015) pointed out that the proof of this result was wrong. The key step in the original proof attempted to show that the number of iterations can be estimated by multiplying the length of the longest root-to-leaf path in a cover tree by a constant factor. However, this estimate can miss many potential nodes in several branches of a cover tree, that should be considered during the neighbor search. The same argument was unfortunately repeated in several subsequent papers using cover trees from 2006. This paper explicitly constructs challenging datasets that provide counterexamples to the past proofs of time complexity for the cover tree construction, the k-nearest neighbor search presented at ICML 2006, and the dual-tree search algorithm published in NIPS 2009. The corrected near-linear time complexities with extra parameters are proved in another forthcoming paper by using a new compressed cover tree simplifying the original tree structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题