论文标题
与Bregman Diverences聚类的在线核心
Online Coresets for Clustering with Bregman Divergences
论文作者
论文摘要
我们提出算法,该算法根据Bregman Diverence的广泛子集,在在线设置中创建核心。值得注意的是,我们的核心具有一个小的添加误差,与轻型核心Bachem等相似。 al。 2018年,并为每个进入点$ d $是该点的尺寸的更新时间$ o(d)$。我们的第一个算法给出了在线大小$ \ tilde {o}(\ mbox {poly}(k,d,d,ε,μ))$ $ k $ - clusterings $ clusterings $ clusterings $ $ $ $ $ $ $ $ $ - 类似于bregman divergence。我们进一步扩展了该算法以显示非参数核的存在,其中核心大小独立于$ k $,即簇的数量,对于布雷格曼(Bregman)的差异相同。我们的非参数核心的较大元素(\ log n)$($ n $是点数),并且具有相似的(小)添加剂保证。同时,我们的核心也充当了Bregman聚类(如DP均值)的非参数版本的轻质核心。尽管这些核心提供了加性错误的保证,但它们也较小(用$ O(\ log n)$比例比$ O(d^d)$相比,$ r^d $中的点比Bachem et中获得的(相对 - 错误)核心。 al。 2015年DP均值。尽管我们的非参数核心是存在的,但我们在某些假设下提供了算法版本。
We present algorithms that create coresets in an online setting for clustering problems according to a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the lightweight coresets Bachem et. al. 2018, and take update time $O(d)$ for every incoming point where $d$ is dimension of the point. Our first algorithm gives online coresets of size $\tilde{O}(\mbox{poly}(k,d,ε,μ))$ for $k$-clusterings according to any $μ$-similar Bregman divergence. We further extend this algorithm to show existence of a non-parametric coresets, where the coreset size is independent of $k$, the number of clusters, for the same subclass of Bregman divergences. Our non-parametric coresets are larger by a factor of $O(\log n)$ ($n$ is number of points) and have similar (small) additive guarantee. At the same time our coresets also function as lightweight coresets for non-parametric versions of the Bregman clustering like DP-Means. While these coresets provide additive error guarantees, they are also significantly smaller (scaling with $O(\log n)$ as opposed to $O(d^d)$ for points in $R^d$) than the (relative-error) coresets obtained in Bachem et. al. 2015 for DP-Means. While our non-parametric coresets are existential, we give an algorithmic version under certain assumptions.