论文标题
综合稀疏部分最小二乘
Integrative Sparse Partial Least Squares
论文作者
论文摘要
部分最小二乘作为缩小尺寸的方法,对于处理大量变量的问题的能力已经变得越来越重要。由于嘈杂的变量可能会削弱模型的性能,因此已经提出了稀疏的部分最小二乘(SPL)技术来识别重要变量并产生更可解释的结果。但是,单个数据集的小样本量限制了常规方法的性能。有效的解决方案来自从多项可比研究中收集信息。综合分析在多数据集分析中具有重要的地位。主要思想是通过组装原始数据集并共同分析它们来改善估计结果。在本文中,我们使用基于SPL技术的惩罚开发了一种综合SPL(ISPLS)方法。拟议的方法包括两项罚款。第一个惩罚在整合分析的背景下进行变量选择;第二次惩罚是一种对比,以鼓励跨数据集的估计值相似,并产生更合理和准确的结果。提供了计算算法。进行了仿真实验,以将ISPL与替代方法进行比较。 ISPLS的实际实用性显示在两个TCGA基因表达数据的分析中。
Partial least squares, as a dimension reduction method, has become increasingly important for its ability to deal with problems with a large number of variables. Since noisy variables may weaken the performance of the model, the sparse partial least squares (SPLS) technique has been proposed to identify important variables and generate more interpretable results. However, the small sample size of a single dataset limits the performance of conventional methods. An effective solution comes from gathering information from multiple comparable studies. The integrative analysis holds an important status among multi-datasets analyses. The main idea is to improve estimation results by assembling raw datasets and analyzing them jointly. In this paper, we develop an integrative SPLS (iSPLS) method using penalization based on the SPLS technique. The proposed approach consists of two penalties. The first penalty conducts variable selection under the context of integrative analysis; The second penalty, a contrasted one, is imposed to encourage the similarity of estimates across datasets and generate more reasonable and accurate results. Computational algorithms are provided. Simulation experiments are conducted to compare iSPLS with alternative approaches. The practical utility of iSPLS is shown in the analysis of two TCGA gene expression data.