论文标题
通过对称数据聚集在一般依赖性下的错误发现率控制
False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation
论文作者
论文摘要
我们开发了一类新的分配 - 在普遍依赖性下的虚假发现率(FDR)控制的免费多重测试规则。我们建议中的一个关键要素是通过样本分割,数据筛选和信息池进行对称的数据聚合(SDA)方法。提出的SDA滤波器首先构建了符合全局对称属性的排名统计序列,然后选择一个沿排名驱动的阈值以控制FDR。 SDA滤波器在中等至强依赖性下的功率上的仿冒方法显着优于基于渐近$ p $值的现有方法更强大。我们首先开发有限样本理论,以在一般依赖性下为实际的FDR提供上限,然后在轻度的规律条件下建立对FDR和错误发现比例(FDP)控制的SDA的渐近有效性。该过程在R pakecttt {sda}中实现。数值结果证实了SDA在FDR控制中的有效性和鲁棒性,并表明它在许多环境中都对现有方法实现了可观的功率增益。
We develop a new class of distribution--free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data--driven threshold along the ranking to control the FDR. The SDA filter substantially outperforms the knockoff method in power under moderate to strong dependence, and is more robust than existing methods based on asymptotic $p$-values. We first develop finite--sample theory to provide an upper bound for the actual FDR under general dependence, and then establish the asymptotic validity of SDA for both the FDR and false discovery proportion (FDP) control under mild regularity conditions. The procedure is implemented in the R package \texttt{SDA}. Numerical results confirm the effectiveness and robustness of SDA in FDR control and show that it achieves substantial power gain over existing methods in many settings.