论文标题
快速准确的$ k $ -means ++通过拒绝采样
Fast and Accurate $k$-means++ via Rejection Sampling
论文作者
论文摘要
$ k $ -Means ++ \ cite {arthur2007k}是一种广泛使用的聚类算法,易于实现,具有良好的理论保证和强大的经验性能。尽管采用了广泛的采用,但$ k $ -MEANS ++有时会遭受大型数据集的缓慢,因此自然的问题是获得具有相似保证的更有效算法。在本文中,我们提出了$ K $ -MEANS ++种子的接近线性时间算法。有趣的是,我们的算法获得了与$ k $ -Means ++相同的理论保证,并显着改善了快速$ K $ -MEANS ++种子的早期结果。此外,我们从经验上表明,我们的算法明显比$ k $ -Means ++快得多,并获得了同等质量的解决方案。
$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.