论文标题

推断近似当地的错误发现率

Inference with approximate local false discovery rates

论文作者

Karmakar, Rajesh, Heller, Ruth, Rosset, Saharon

论文摘要

EFRON的两组模型被广泛用于大规模多重测试。该模型假设测试统计数据是相互独立的,但是在现实的设置中,它们通常取决于它们,并且考虑到依赖性可以提高功率。一般的两组模型考虑了测试统计数据之间的依赖性。一般两组模型中的最佳策略需要计算每个假设,即鉴于所有测试统计量(表示局部错误的发现率(LOCFDR)),它是真正的null的概率。不幸的是,在逼真的依赖性结构下计算LOCFDR可能在计算上是过敏的。我们建议根据每个假设的正确定义的n-邻域来计算近似LOCFDR。我们证明,通过用固定阈值阈值阈值,对任何依赖性结构都会控制边缘错误的发现率。此外,我们证明这是一类限制的决策规则中的最佳程序,在这种规则中,每个假设的决定仅受其n-邻居的指导。我们通过广泛的模拟表明,我们所提出的方法与替代实践方法相比获得了可观的功率增长,同时保持概念的简单性和计算可行性。我们在基因组广泛的高度研究研究中证明了我们方法的实用性。

Efron's two-group model is widely used in large scale multiple testing. This model assumes that test statistics are mutually independent, however in realistic settings they are typically dependent, and taking the dependence into account can boost power. The general two-group model takes the dependence between the test statistics into account. Optimal policies in the general two-group model require calculation, for each hypothesis, of the probability that it is a true null given all test statistics, denoted local false discovery rate (locFDR). Unfortunately, calculating locFDRs under realistic dependence structures can be computationally prohibitive. We propose calculating approximate locFDRs based on a properly defined N-neighborhood for each hypothesis. We prove that by thresholding the approximate locFDRs with a fixed threshold, the marginal false discovery rate is controlled for any dependence structure. Furthermore, we prove that this is the optimal procedure in a restricted class of decision rules, where decision for each hypothesis is only guided by its N-neighborhood. We show through extensive simulations that our proposed method achieves substantial power gains compared to alternative practical approaches, while maintaining conceptual simplicity and computational feasibility. We demonstrate the utility of our method on a genome wide association study of height.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源