论文标题
核心密度估计的统计观点
A Statistical Perspective on Coreset Density Estimation
论文作者
论文摘要
通过在保留大多数信息的同时选择原始观测值的一小部分来总结数据的强大工具,可以作为总结数据的强大工具。这种方法导致了重大的计算加速,但是在核心集上运行的统计程序的性能在很大程度上没有探索。在这项工作中,我们开发了一个统计框架来研究核心,并专注于非参数密度估计的规范任务。我们的贡献是双重的。首先,我们建立了基于核心估计器可实现的最小估计率。其次,我们表明实用的核心内核密度估计量在大量的Hölder-Smooth密度上是最佳的最佳选择。
Coresets have emerged as a powerful tool to summarize data by selecting a small subset of the original observations while retaining most of its information. This approach has led to significant computational speedups but the performance of statistical procedures run on coresets is largely unexplored. In this work, we develop a statistical framework to study coresets and focus on the canonical task of nonparameteric density estimation. Our contributions are twofold. First, we establish the minimax rate of estimation achievable by coreset-based estimators. Second, we show that the practical coreset kernel density estimators are near-minimax optimal over a large class of Hölder-smooth densities.