论文标题
蝙蝠侠:通过分层来减轻批处理效应以进行生存结果预测
BatMan: Mitigating Batch Effects via Stratification for Survival Outcome Prediction
论文作者
论文摘要
无处不在的批处理效应的存在阻碍了转录组数据的可再现翻译。最初在样本组比较的设置中开发了用于管理批处理效应的统计方法,然后借用其他设置,例如生存结果预测。最值得注意的方法是战斗,该方法通过将其作为与样本组一起在线性回归中与样本组并肩作战来调整批处理。但是,在生存预测中,战斗是在没有可确定的基团生存结果的情况下使用的,并且在可能混淆的结果中依次完成生存回归。为了解决这些问题,我们提出了一种称为蝙蝠侠的新方法(“通过分层减轻批处理”)。它在生存回归中将批量调整为阶层,并利用诸如套索之类的可变选择方法来处理高维度。我们在与战斗相比,我们评估蝙蝠侠的性能,每种都单独或与数据归一化结合使用,在基于重新采样的仿真研究中,在各种级别的预测信号强度和批处结果关联的模式下。我们的模拟表明,(1)在数据中存在批处理效应时,在几乎所有情况下,蝙蝠侠的表现都优于战斗,并且(2)通过添加数据归一化可以使它们的性能恶化。我们使用来自癌症基因组图集的卵巢癌的microRNA数据进一步评估它们,并发现蝙蝠侠在添加数据归一化的同时对战斗进行战斗会使预测恶化。因此,我们的研究表明了蝙蝠侠的优势,并提出了关于在开发生存预测模型的情况下天真地使用数据归一化的谨慎。蝙蝠侠方法和用于性能评估的仿真工具在R中实施,并在https://github.com/lxqin/precision.survival上公开获得。
Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially confounded outcome. To address these issues, we propose a new method, called BatMan ("BATch MitigAtion via stratificatioN"). It adjusts batches as strata in survival regression and utilize variable selection methods such as LASSO to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a re-sampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data, and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas, and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the naive use of data normalization in the context of developing survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at https://github.com/LXQin/PRECISION.survival.