论文标题
RNASEQ数据的在线FDR控制
Online FDR Control for RNAseq Data
论文作者
论文摘要
动机:尽管文献中对单个RNA测序(RNASEQ)数据集进行了很好的描述,但现代研究工作流程通常具有额外的复杂性,因为随着时间的推移,相关的RNASEQ实验会顺序进行。最简单,最广泛的分析策略忽略了时间方面,并分别分析每个数据集。但是,这可能导致总体错误发现率(FDR)的大量充气。我们建议将最近开发的方法用于在线假设测试,以原则上的方式分析顺序RNASEQ实验,从而保证FDR控制始终,而从未改变过去的决策。结果:我们表明,随着时间的推移,标准的离线方法对相关RNASEQ实验的FDR具有可变的控制,并且一种天真的组成方法可能会不当改变历史决策。我们证明在线FDR算法是保证控制FDR的原则方法。此外,在某些模拟方案中,我们从经验上观察到在线方法具有与离线方法相当的能力。可用性和实现:在线FDR软件包可在http://www.bioconductor.org/packages/onlinefdr免费获得。可以在https://github.com/latlio/onlinefdr_rnaseq_simulation上找到用于仿真研究的其他代码。
Motivation: While the analysis of a single RNA sequencing (RNAseq) dataset has been well described in the literature, modern research workflows often have additional complexity in that related RNAseq experiments are performed sequentially over time. The simplest and most widely used analysis strategy ignores the temporal aspects and analyses each dataset separately. However, this can lead to substantial inflation of the overall false discovery rate (FDR). We propose applying recently developed methodology for online hypothesis testing to analyse sequential RNAseq experiments in a principled way, guaranteeing FDR control at all times while never changing past decisions. Results: We show that standard offline approaches have variable control of FDR of related RNAseq experiments over time and a naively composed approach may improperly change historic decisions. We demonstrate that the online FDR algorithms are a principled way to guarantee control of FDR. Furthermore, in certain simulation scenarios, we observe empirically that online approaches have comparable power to offline approaches. Availability and Implementation: The onlineFDR package is freely available at http: //www.bioconductor.org/packages/onlineFDR. Additional code used for the simulation studies can be found at https://github.com/latlio/onlinefdr_rnaseq_simulation