论文标题
校准扫描统计量:有限样本性能与渐近学
Calibrating the scan statistic: finite sample performance vs. asymptotics
论文作者
论文摘要
我们考虑在单变量高斯序列模型中检测到位置未知的间隔和长度的间隔的升高平均值的问题。最近的结果表明,对于扫描统计量,使用比例依赖性临界值可以同时在所有信号长度上同时获得最佳的最佳检测,从而在传统扫描中改善,但是该程序因在短信号中失去了太多功能而受到批评。我们通过表明这些渐近最优结果必定会过于不精确,无法以实际相关的方式辨别扫描统计的性能,即使在较大的样本环境中,我们也必须过于不精确地解释了这一差异。相反,我们建议通过新的有限样本标准评估性能。然后,我们为扫描统计数据提供了三个校准,这些校准在一系列相关的信号长度上表现良好:第一个校准使用特定的调整对临界值,因此是针对高斯案例量身定制的。第二个校准使用对显着性水平的规模依赖性调整,因此适用于任意已知的NULL分布。第三个校准将扫描限制为扫描窗口的特定稀疏子集,然后将加权的Bonferroni调整应用于相应的测试统计数据。此{\ sl bonferroni扫描}也适用于任意零分布,此外非常易于实现。我们展示了如何将这些校准应用于多种分布环境中的扫描:对于具有未知基线和已知或未知的恒定差异的正常观察,从自然指数族的观察中观察到自然指数家族的观察,以通过在新颖的方式中使用自我构型,并使用可换取测试的迹象,从而通过使用自我分正来进行对称密度的观察。
We consider the problem of detecting an elevated mean on an interval with unknown location and length in the univariate Gaussian sequence model. Recent results have shown that using scale-dependent critical values for the scan statistic allows to attain asymptotically optimal detection simultaneously for all signal lengths, thereby improving on the traditional scan, but this procedure has been criticized for losing too much power for short signals. We explain this discrepancy by showing that these asymptotic optimality results will necessarily be too imprecise to discern the performance of scan statistics in a practically relevant way, even in a large sample context. Instead, we propose to assess the performance with a new finite sample criterion. We then present three calibrations for scan statistics that perform well across a range of relevant signal lengths: The first calibration uses a particular adjustment to the critical values and is therefore tailored to the Gaussian case. The second calibration uses a scale-dependent adjustment to the significance levels and is therefore applicable to arbitrary known null distributions. The third calibration restricts the scan to a particular sparse subset of the scan windows and then applies a weighted Bonferroni adjustment to the corresponding test statistics. This {\sl Bonferroni scan} is also applicable to arbitrary null distributions and in addition is very simple to implement. We show how to apply these calibrations for scanning in a number of distributional settings: for normal observations with an unknown baseline and a known or unknown constant variance,for observations from a natural exponential family, for potentially heteroscadastic observations from a symmetric density by employing self-normalization in a novel way, and for exchangeable observations using tests based on permutations, ranks or signs.