论文标题
Minimax Quasi-bayesian稀疏规范相关性分析通过瑞利商函数
Minimax Quasi-Bayesian estimation in sparse canonical correlation analysis via a Rayleigh quotient function
论文作者
论文摘要
规范相关分析(CCA)是一种流行的统计技术,用于探索数据集之间的关系。近年来,稀疏规范向量的估计已成为CCA问题的重要但具有挑战性的变体,并具有广泛的应用。不幸的是,稀疏规范向量的现有速率估计器具有很高的计算成本。我们提出了一种准乘估计程序,不仅可以达到最小值估计率,而且很容易通过马尔可夫链蒙特卡洛(MCMC)计算。该方法建立在Tan等人的基础上。 (2018年),并使用重新缩放的瑞利商功能作为准log类。但是,与Tan等人不同。 (2018年),我们采用了一个贝叶斯框架,该框架将这种准log的类似性与尖刺和斜杠结合起来,然后再定期推断并促进稀疏性。我们在连续数据和截短的数据上研究了所提出方法的经验行为,我们证明了它的表现优于几种最新方法。作为应用,我们使用拟议的方法来最大程度地关联临床变量和蛋白质组学数据,以更好地理解Covid-19疾病。
Canonical correlation analysis (CCA) is a popular statistical technique for exploring relationships between datasets. In recent years, the estimation of sparse canonical vectors has emerged as an important but challenging variant of the CCA problem, with widespread applications. Unfortunately, existing rate-optimal estimators for sparse canonical vectors have high computational cost. We propose a quasi-Bayesian estimation procedure that not only achieves the minimax estimation rate, but also is easy to compute by Markov Chain Monte Carlo (MCMC). The method builds on Tan et al. (2018) and uses a re-scaled Rayleigh quotient function as the quasi-log-likelihood. However, unlike Tan et al. (2018), we adopt a Bayesian framework that combines this quasi-log-likelihood with a spike-and-slab prior to regularize the inference and promote sparsity. We investigate the empirical behavior of the proposed method on both continuous and truncated data, and we demonstrate that it outperforms several state-of-the-art methods. As an application, we use the proposed methodology to maximally correlate clinical variables and proteomic data for better understanding the Covid-19 disease.