论文标题

学习稀疏性和多视图混合模型中的对角线结构

Learning Sparsity and Block Diagonal Structure in Multi-View Mixture Models

论文作者

Carmichael, Iain

论文摘要

科学研究越来越多地收集了多种数据模式,从几个角度研究了现象。在综合数据分析中,重要的是要了解如何在这些不同的数据源中分布信息。为此,我们考虑了多视图数据集中对象的参数聚类模型(即来自同一受试者集的多个数据源),其中每个视图都略微遵循混合模型。在两个视图的情况下,它们之间的依赖性由群集成员矩阵参数捕获,我们旨在学习此矩阵的结构(例如,零模式)。首先,我们开发了一种受惩罚的可能性方法来估计集群成员矩阵的稀疏模式。对于块对角线结构的特定情况,我们开发出一个约束的似然公式,其中该基质被约束为块对角线,直到行排入排列。为了强制阻止对角线约束,我们提出了一种基于对称图拉普拉斯的新型优化方法。我们通过模拟和应用程序对癌症遗传学和神经科学的数据集进行了这些方法的性能。两种方法自然都会扩展到多个视图。

Scientific studies increasingly collect multiple modalities of data to investigate a phenomenon from several perspectives. In integrative data analysis it is important to understand how information is heterogeneously spread across these different data sources. To this end, we consider a parametric clustering model for the subjects in a multi-view data set (i.e. multiple sources of data from the same set of subjects) where each view marginally follows a mixture model. In the case of two views, the dependence between them is captured by a cluster membership matrix parameter and we aim to learn the structure of this matrix (e.g. the zero pattern). First, we develop a penalized likelihood approach to estimate the sparsity pattern of the cluster membership matrix. For the specific case of block diagonal structures, we develop a constrained likelihood formulation where this matrix is constrained to be block diagonal up to permutations of the rows and columns. To enforce block diagonal constraints we propose a novel optimization approach based on the symmetric graph Laplacian. We demonstrate the performance of these methods through both simulations and applications to data sets from cancer genetics and neuroscience. Both methods naturally extend to multiple views.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源