论文标题
具有计算生物学应用的贝叶斯双聚类方法
Bayesian Bi-clustering Methods with Applications in Computational Biology
论文作者
论文摘要
当观察结果来自异质组并具有大量特征时,双聚类是分析生物学数据的一种有用方法。我们概述了一种通用的贝叶斯方法,用于解决中等至高维的双聚类问题,并提出了三种贝叶斯双聚类模型,这些模型在分类数据上提高了它们的复杂性,它们会增加其在双重群体中特征分布的建模。我们提出的方法适用于多种情况:从数据仅在一小部分特征中可以区分数据但被大量噪声掩盖的情况,到不同的数据组通过不同的特征或数据表现出层次结构来识别的情况。通过仿真研究,我们表明我们的方法在识别簇和恢复跨双重群体的特征分布模式中的现有(BI-)聚类方法优于现有的(BI-)聚类方法。我们将方法应用于两个遗传数据集,尽管我们的方法的应用领域更广泛。
Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to high dimensions, and propose three Bayesian bi-clustering models on categorical data, which increase in complexities in their modeling of the distributions of features across bi-clusters. Our proposed methods apply to a wide range of scenarios: from situations where data are cluster-distinguishable only among a small subset of features but masked by a large amount of noise, to situations where different groups of data are identified by different sets of features or data exhibit hierarchical structures. Through simulation studies, we show that our methods outperform existing (bi-)clustering methods in both identifying clusters and recovering feature distributional patterns across bi-clusters. We apply our methods to two genetic datasets, though the area of application of our methods is even broader.