论文标题

高斯内核密度估计的最佳核心

Optimal Coreset for Gaussian Kernel Density Estimation

论文作者

Tai, Wai Ming

论文摘要

Given a point set $P\subset \mathbb{R}^d$, the kernel density estimate of $P$ is defined as \[ \overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \ right \ rvert^2} \]对于任何$ x \ in \ mathbb {r}^d $。我们研究了如何构建一个小的子集$ q $ $ p $的$ q $,以使内核密度估计为$ p $,近似于$ q $的内核密度估计。该子集$ Q $称为核心。这项工作的主要技术是在差异理论的点上构建了$ \ pm 1 $颜色,我们利用Banaszczyk的定理。当$ d> 1 $是一个常数时,我们的构造给出了尺寸$ o \ left的核心(\ frac {1} {\ varepsilon} \ right)$,而不是最著名的结果$ o \ left(\ frac {1} {\ varepsilon} \ sqrt {\ log \ frac {1} {\ varepsilon}}} \ right)$。这是在$ \ sqrt {\ log} $ factor的屏障上的突破的第一个结果,即使$ d = 2 $也是如此。

Given a point set $P\subset \mathbb{R}^d$, the kernel density estimate of $P$ is defined as \[ \overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} \] for any $x\in\mathbb{R}^d$. We study how to construct a small subset $Q$ of $P$ such that the kernel density estimate of $P$ is approximated by the kernel density estimate of $Q$. This subset $Q$ is called a coreset. The main technique in this work is constructing a $\pm 1$ coloring on the point set $P$ by discrepancy theory and we leverage Banaszczyk's Theorem. When $d>1$ is a constant, our construction gives a coreset of size $O\left(\frac{1}{\varepsilon}\right)$ as opposed to the best-known result of $O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\varepsilon}}\right)$. It is the first result to give a breakthrough on the barrier of $\sqrt{\log}$ factor even when $d=2$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源