论文标题
TK-Merge:一般假设下的计算有效鲁棒聚类
Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions
论文作者
论文摘要
我们基于修剪的K-均值和分层的团聚,在非常弱的参数假设下解决一般形状的聚类问题。该算法的计算复杂性低,并且在存在数据污染的情况下也有效地识别了簇。我们还介绍了该方法的自然概括以及一种自适应程序,以估计以数据驱动方式估算污染的数量。我们的建议在数值模拟和与颜色量化有关的现实世界中的最先进的基于模型的方法优于图像分析,基于GPS数据,糖尿病性视网膜病变的生物医学图像以及跨天气站的功能数据相关的实际应用。
We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity and effectively identifies the clusters also in presence of data contamination. We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data-driven fashion. Our proposal outperforms state-of-the-art robust, model-based methods in our numerical simulations and real-world applications related to color quantization for image analysis, human mobility patterns based on GPS data, biomedical images of diabetic retinopathy, and functional data across weather stations.