快速准确的$ k $ -means ++通过拒绝采样

论文标题

快速准确的$ k $ -means ++通过拒绝采样

Fast and Accurate $k$-means++ via Rejection Sampling

论文作者

Cohen-Addad, Vincent, Lattanzi, Silvio, Norouzi-Fard, Ashkan, Sohler, Christian, Svensson, Ola

论文摘要

$ k $ -Means ++ \ cite {arthur2007k}是一种广泛使用的聚类算法，易于实现，具有良好的理论保证和强大的经验性能。尽管采用了广泛的采用，但$ k $ -MEANS ++有时会遭受大型数据集的缓慢，因此自然的问题是获得具有相似保证的更有效算法。在本文中，我们提出了$ K $ -MEANS ++种子的接近线性时间算法。有趣的是，我们的算法获得了与$ k $ -Means ++相同的理论保证，并显着改善了快速$ K $ -MEANS ++种子的早期结果。此外，我们从经验上表明，我们的算法明显比$ k $ -Means ++快得多，并获得了同等质量的解决方案。

$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题