论文标题
使用Copula模型的定向依赖性多视图聚类
Directionally Dependent Multi-View Clustering Using Copula Model
论文作者
论文摘要
在最近的生物医学科学问题中,这是从多个数据集中集成一组对象的基本问题。此类问题主要在基因组学中遇到,其中从各种来源收集数据,通常代表不同但互补的信息。将这些数据源整合到多源集群中,由于其复杂的依赖性结构(包括定向依赖性),这是一项挑战。特别是在基因组学研究中,众所周知,DNA表达,DNA甲基化和RNA表达之间存在一定的方向依赖性,被广泛称为中心教条。 大多数现有的多视图聚类方法要么假设独立结构或成对(非方向)依赖关系,从而忽略了方向关系。在此激励的情况下,我们提出了一个基于COPULA的多视图聚类模型,其中Copula使该模型能够适应数据集中存在的方向依赖性。我们进行了一个模拟实验,其中模拟数据集表现出固有的方向依赖性:事实证明,忽略方向依赖性会对群集性能产生负面影响。作为实际应用,我们将模型应用于从癌症基因组ALTAS(TCGA)收集的乳腺癌肿瘤样本中。
In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma. Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).