论文标题

ClusterBMA:贝叶斯模型平均用于聚类

clusterBMA: Bayesian model averaging for clustering

论文作者

Forbes, Owen, Santos-Fernandez, Edgar, Wu, Paul Pao-Yen, Xie, Hong-Bo, Schwenn, Paul E., Lagopoulos, Jim, Mills, Lia, Sacks, Dashiell D., Hermens, Daniel F., Mengersen, Kerrie

论文摘要

已经开发了各种方法,以在集合群集文献中跨多组结果组合推理,以进行无监督的聚类。从几个候选聚类模型中的一个“最佳”模型报告的报告方法通常忽略了模型选择产生的不确定性,并且会导致对所选择的特定模型和参数敏感的推论。贝叶斯模型平均(BMA)是将结果组合到多个模型中的结果的流行方法,这些模型在这种情况下提供了一些有吸引力的好处,包括对群集结构的概率解释和基于模型的不确定性的量化。 在这项工作中,我们介绍了ClusterBMA,该方法可以通过多种无监督聚类算法进行加权模型平均。我们使用聚类内部验证标准来开发后验模型概率的近似值,用于加权每个模型的结果。从代表跨模型的聚类解决方案的加权平均值的共识矩阵中,我们应用对称的单纯形矩阵分解来计算最终的概率集群分配。除了在模拟数据上胜过其他集合聚类方法外,ClusterBMA还提供了独特的功能,包括对平均群集的概率分配,结合了“硬”和“软”群集算法的分配概率,以及测量平均群集分配中基于模型的不确定性。此方法是在同名的随附的R软件包中实现的。

Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a consensus matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源