论文标题
色度$ k $ - 最近的邻居查询
Chromatic $k$-Nearest Neighbor Queries
论文作者
论文摘要
令$ p $是一组$ n $彩色点。我们开发了存储$ p $的有效数据结构,并且可以回答色彩$ k $ - neartement($ k $ -nn)查询。这样的查询包括一个查询点$ q $和一个数字$ k $,并要求在$ k $ point中最常出现的颜色$ p $最接近$ q $。有效回答此类查询是获得快速$ K $ -NN分类器的关键。我们的主要目的是在使用近线性空间时获得独立于$ K $的查询时间。 我们表明,使用两个数据结构的组合是可能的。第一个数据结构使我们能够计算一个恰好包含查询点$ q $的$ k $ neart邻居的区域,然后第二个数据结构可以报告该区域中最常见的颜色。这会导致$ o(n^{1/2} \ log n)$的线性空间数据结构,以$ \ mathbb {r}^1 $中的点为点,并且查询时间在$ o(n^{2/3} {2/3} \ log log^{2/3} n)$(2/3} n)$($ O(使用的距离度量,用于$ \ mathbb {r}^2 $中的点。由于这些查询时间仍然很大,因此我们也考虑近似值。如果我们被允许报告一种至少出现$(1- \ varepsilon)f^*$次的颜色,其中$ f^*$是最常见颜色的频率,我们获得了$ o(\ log n + \ log \ log \ log \ log _ {\ frac {\ frac {\ frac {1}} {1- \ varepsilon} $ and Tirs $ and Tirs $ and Tirs $ and Query and Time $ and Query and Tirm and Tirm and Tirm and Tirm and Query和time。在$ \ tilde {o}(n^{1/2} \ varepsilon^{ - 3/2})$和$ \ tilde {o}(n^{1/2} \ varepsilon^{ - 5/5/2})$ in $ \ mathbb {r mathbb {r r}^2 $ insion clace in Matherm inniS in niment praceligant ar之间。
Let $P$ be a set of $n$ colored points. We develop efficient data structures that store $P$ and can answer chromatic $k$-nearest neighbor ($k$-NN) queries. Such a query consists of a query point $q$ and a number $k$, and asks for the color that appears most frequently among the $k$ points in $P$ closest to $q$. Answering such queries efficiently is the key to obtain fast $k$-NN classifiers. Our main aim is to obtain query times that are independent of $k$ while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the $k$-nearest neighbors of a query point $q$, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of $O(n^{1 / 2} \log n)$ for points in $\mathbb{R}^1$, and with query times varying between $O(n^{2/3}\log^{2/3} n)$ and $O(n^{5/6} {\rm polylog} n)$, depending on the distance measure used, for points in $\mathbb{R}^2$. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least $(1-\varepsilon)f^*$ times, where $f^*$ is the frequency of the most frequent color, we obtain a query time of $O(\log n + \log\log_{\frac{1}{1-\varepsilon}} n)$ in $\mathbb{R}^1$ and expected query times ranging between $\tilde{O}(n^{1/2}\varepsilon^{-3/2})$ and $\tilde{O}(n^{1/2}\varepsilon^{-5/2})$ in $\mathbb{R}^2$ using near-linear space (ignoring polylogarithmic factors).