论文标题
点集内核聚类
Point-Set Kernel Clustering
论文作者
论文摘要
测量两个对象之间的相似性是在将相似对象分组为簇中的现有聚类算法中的核心操作。本文介绍了一种称为Point-Set内核的新相似性度量,该测度计算对象和一组对象之间的相似性。提出的聚类过程利用此新措施来表征从种子对象生长的每个群集。我们表明,新的聚类过程既有效又有效,使其能够处理大型数据集。相反,现有的聚类算法是有效的或有效的。与最先进的密度峰值聚类和可扩展的内核K-均值聚类相比,我们表明所提出的算法更有效,并且在申请数百万个数据点的数据集(在常用计算机上的数百万个数据点)时运行更快的数量级。
Measuring similarity between two objects is the core operation in existing clustering algorithms in grouping similar objects into clusters. This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects. The proposed clustering procedure utilizes this new measure to characterize every cluster grown from a seed object. We show that the new clustering procedure is both effective and efficient that enables it to deal with large scale datasets. In contrast, existing clustering algorithms are either efficient or effective. In comparison with the state-of-the-art density-peak clustering and scalable kernel k-means clustering, we show that the proposed algorithm is more effective and runs orders of magnitude faster when applying to datasets of millions of data points, on a commonly used computing machine.