论文标题
临界值适合p黑客
Critical Values Robust to P-hacking
论文作者
论文摘要
现实中的P黑暗是普遍存在的,但不含古典假设检验理论。结果,显着的结果比原定假设实际上是正确的,更为普遍。在本文中,我们建立了一种通过P黑式测试的假设检验模型。从模型中,我们构建临界值,以便如果使用这些值来确定显着性,并且如果科学家的P黑暗行为适应了新的显着性标准,则会随着所需的频率而出现显着结果。这种稳健的临界值允许p黑客键,因此它们比经典的临界值大。为了说明P黑客可能需要的校正量,我们使用医学科学的证据对模型进行校准。在校准模型中,任何测试统计量的鲁棒临界值是同一测试统计量的经典临界值,其五分之一是显着性水平。
P-hacking is prevalent in reality but absent from classical hypothesis testing theory. As a consequence, significant results are much more common than they are supposed to be when the null hypothesis is in fact true. In this paper, we build a model of hypothesis testing with p-hacking. From the model, we construct critical values such that, if the values are used to determine significance, and if scientists' p-hacking behavior adjusts to the new significance standards, significant results occur with the desired frequency. Such robust critical values allow for p-hacking so they are larger than classical critical values. To illustrate the amount of correction that p-hacking might require, we calibrate the model using evidence from the medical sciences. In the calibrated model the robust critical value for any test statistic is the classical critical value for the same test statistic with one fifth of the significance level.