论文标题
半监督的稀疏k均值算法
A semi-supervised sparse K-Means algorithm
论文作者
论文摘要
我们考虑具有未识别的功能质量以及提供少量标记数据的数据聚类的问题。可以采用一种无监督的稀疏聚类方法来检测聚类所需的特征的子组,而半监督方法可以使用标记的数据来创建约束并增强聚类解决方案。在本文中,我们提出了采用这些技术的K-均值变体。我们表明,该算法保持其他半监督算法的高性能,并保留了从非信息性特征中识别信息丰富的能力。我们研究了算法在合成和现实世界数据集方面的性能。我们使用不同数量和类型的约束的方案以及不同的聚类初始化方法。
We consider the problem of data clustering with unidentified feature quality and when a small amount of labelled data is provided. An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering and a semi-supervised method can use the labelled data to create constraints and enhance the clustering solution. In this paper we propose a K-Means variant that employs these techniques. We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features. We examine the performance of the algorithm on synthetic and real world data sets. We use scenarios of different number and types of constraints as well as different clustering initialisation methods.