论文标题
欧几里得空间中分层聚类的目标及其与Biset K均值的联系
An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means
论文作者
论文摘要
本文探讨了分层聚类,如果对点成对的分数(例如距离)作为输入的一部分。最近引入的具有相似性分数的点的目标会导致每棵树均为1/2近似,如果距离形成度量。这表明该目标并不能在度量空间中的良好和差分层聚类之间有显着的区别。由此动机,本文为欧几里得空间中的层次聚类发展了一个新的全球目标。该目标捕获了促使使用分裂聚类算法的标准:当发生分裂时,同一群集中的点应该比不同簇中的点更相似。此外,该目标为分层聚类的基础真相输入提供了合理的结果。该论文在此目标与二等k-均值算法之间建立了理论上的联系。本文证明,最佳2均解决方案会导致物镜的恒定近似值。这是第一篇展示二等k-均值算法的论文,可在整棵树上优化自然的全球目标。
This paper explores hierarchical clustering in the case where pairs of points have dissimilarity scores (e.g. distances) as a part of the input. The recently introduced objective for points with dissimilarity scores results in every tree being a 1/2 approximation if the distances form a metric. This shows the objective does not make a significant distinction between a good and poor hierarchical clustering in metric spaces. Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space. The objective captures the criterion that has motivated the use of divisive clustering algorithms: that when a split happens, points in the same cluster should be more similar than points in different clusters. Moreover, this objective gives reasonable results on ground-truth inputs for hierarchical clustering. The paper builds a theoretical connection between this objective and the bisecting k-means algorithm. This paper proves that the optimal 2-means solution results in a constant approximation for the objective. This is the first paper to show the bisecting k-means algorithm optimizes a natural global objective over the entire tree.