色度$ k $ - 最近的邻居查询

论文标题

色度$ k $ - 最近的邻居查询

Chromatic $k$-Nearest Neighbor Queries

论文作者

van der Horst, Thijs, Löffler, Maarten, Staals, Frank

论文摘要

令$ p $是一组$ n $彩色点。我们开发了存储$ p $的有效数据结构，并且可以回答色彩$ k $ - neartement（$ k $ -nn）查询。这样的查询包括一个查询点$ q $和一个数字$ k $，并要求在$ k $ point中最常出现的颜色$ p $最接近$ q $。有效回答此类查询是获得快速$ K $ -NN分类器的关键。我们的主要目的是在使用近线性空间时获得独立于$ K $的查询时间。我们表明，使用两个数据结构的组合是可能的。第一个数据结构使我们能够计算一个恰好包含查询点$ q $的$ k $ neart邻居的区域，然后第二个数据结构可以报告该区域中最常见的颜色。这会导致$ o（n^{1/2} \ log n）$的线性空间数据结构，以$ \ mathbb {r}^1 $中的点为点，并且查询时间在$ o（n^{2/3} {2/3} \ log log^{2/3} n）$（2/3} n）$（$ O（使用的距离度量，用于$ \ mathbb {r}^2 $中的点。由于这些查询时间仍然很大，因此我们也考虑近似值。如果我们被允许报告一种至少出现$（1- \ varepsilon）f^*$次的颜色，其中$ f^*$是最常见颜色的频率，我们获得了$ o（\ log n + \ log \ log \ log \ log _ {\ frac {\ frac {\ frac {1}} {1- \ varepsilon} $ and Tirs $ and Tirs $ and Tirs $ and Query and Time $ and Query and Tirm and Tirm and Tirm and Tirm and Query和time。在$ \ tilde {o}（n^{1/2} \ varepsilon^{ - 3/2}）$和$ \ tilde {o}（n^{1/2} \ varepsilon^{ - 5/5/2}）$ in $ \ mathbb {r mathbb {r r}^2 $ insion clace in Matherm inniS in niment praceligant ar之间。

Let $P$ be a set of $n$ colored points. We develop efficient data structures that store $P$ and can answer chromatic $k$-nearest neighbor ($k$-NN) queries. Such a query consists of a query point $q$ and a number $k$, and asks for the color that appears most frequently among the $k$ points in $P$ closest to $q$. Answering such queries efficiently is the key to obtain fast $k$-NN classifiers. Our main aim is to obtain query times that are independent of $k$ while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the $k$-nearest neighbors of a query point $q$, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of $O(n^{1 / 2} \log n)$ for points in $\mathbb{R}^1$, and with query times varying between $O(n^{2/3}\log^{2/3} n)$ and $O(n^{5/6} {\rm polylog} n)$, depending on the distance measure used, for points in $\mathbb{R}^2$. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least $(1-\varepsilon)f^*$ times, where $f^*$ is the frequency of the most frequent color, we obtain a query time of $O(\log n + \log\log_{\frac{1}{1-\varepsilon}} n)$ in $\mathbb{R}^1$ and expected query times ranging between $\tilde{O}(n^{1/2}\varepsilon^{-3/2})$ and $\tilde{O}(n^{1/2}\varepsilon^{-5/2})$ in $\mathbb{R}^2$ using near-linear space (ignoring polylogarithmic factors).

下载PDF全文

下载文献需遵守相关版权规定

论文标题