论文标题
通过平滑分析,更快的PAC学习和较小的核心
Faster PAC Learning and Smaller Coresets via Smoothed Analysis
论文作者
论文摘要
PAC学习通常旨在从$ n $项目中计算一个小子集($ \ varepsilon $ -sampame/net),从给定的一组查询集中,这可以证明每个查询(模型,分类器,假设,假设)的给定损失函数,最高可添加错误$ \ varepsilon \ in(0,0,1,1,1)$。核心将此想法概括以支持乘法错误$ 1 \ pm \ varepsilon $。 受平滑分析的启发,我们建议一种自然的概括:近似\ emph {peraver}(而不是查询最差的)误差,以期获得较小的子集。不同查询的错误之间的依赖性意味着我们可能不再将Chernoff-Hoeffding不等式应用于固定查询,然后使用VC差异或联合界限。 本文为任何有限的查询和损耗功能集提供了用于计算此类核心的确定性和随机算法和$ \ varepsilon $ -Samples的大小独立于$ N $的样本。示例应用程序包括新的和改进的核心结构,例如流矢量摘要[ICML'17]和$ K $ -PCA [NIPS'16]。提供了开源代码的实验结果。
PAC-learning usually aims to compute a small subset ($\varepsilon$-sample/net) from $n$ items, that provably approximates a given loss function for every query (model, classifier, hypothesis) from a given set of queries, up to an additive error $\varepsilon\in(0,1)$. Coresets generalize this idea to support multiplicative error $1\pm\varepsilon$. Inspired by smoothed analysis, we suggest a natural generalization: approximate the \emph{average} (instead of the worst-case) error over the queries, in the hope of getting smaller subsets. The dependency between errors of different queries implies that we may no longer apply the Chernoff-Hoeffding inequality for a fixed query, and then use the VC-dimension or union bound. This paper provides deterministic and randomized algorithms for computing such coresets and $\varepsilon$-samples of size independent of $n$, for any finite set of queries and loss function. Example applications include new and improved coreset constructions for e.g. streaming vector summarization [ICML'17] and $k$-PCA [NIPS'16]. Experimental results with open source code are provided.