论文标题
统计学家的私人顺序假设测试:隐私,错误率和样本量
Private Sequential Hypothesis Testing for Statisticians: Privacy, Error Rates, and Sample Size
论文作者
论文摘要
顺序假设检验问题是一类统计分析,其中样本量未提前固定。取而代之的是,决策过程依次采用新的观察结果,以做出实时决策,以测试针对零假设的替代假设,直到满足某些停止标准为止。在顺序假设检验的许多常见应用中,数据可能高度敏感,可能需要保护隐私;例如,顺序假设检验用于临床试验,在临床试验中,医生会顺序从患者那里收集数据,并且必须确定何时停止招募患者以及治疗是否有效。已经开发了差异隐私的领域,以提供具有强大隐私保证的数据分析工具,并且通常应用于机器学习和统计任务。 在这项工作中,我们研究了在差异隐私(称为Renyi差异隐私)的轻微变体下的顺序假设检验问题。我们根据Wald的顺序概率比测试(SPRT)提出了一种新的私人算法,该算法也提供了强大的理论隐私保证。我们提供了有关I型和II型误差以及预期样本量的统计绩效的理论分析。我们还从经验上验证了几个合成数据库的理论结果,这表明我们的算法在实践中也表现良好。与仅专注于经典固定样本设置的私人假设测试中的先前工作不同,我们在顺序设置中的结果可以更早得出结论,从而节省了收集其他样品的成本。
The sequential hypothesis testing problem is a class of statistical analyses where the sample size is not fixed in advance. Instead, the decision-process takes in new observations sequentially to make real-time decisions for testing an alternative hypothesis against a null hypothesis until some stopping criterion is satisfied. In many common applications of sequential hypothesis testing, the data can be highly sensitive and may require privacy protection; for example, sequential hypothesis testing is used in clinical trials, where doctors sequentially collect data from patients and must determine when to stop recruiting patients and whether the treatment is effective. The field of differential privacy has been developed to offer data analysis tools with strong privacy guarantees, and has been commonly applied to machine learning and statistical tasks. In this work, we study the sequential hypothesis testing problem under a slight variant of differential privacy, known as Renyi differential privacy. We present a new private algorithm based on Wald's Sequential Probability Ratio Test (SPRT) that also gives strong theoretical privacy guarantees. We provide theoretical analysis on statistical performance measured by Type I and Type II error as well as the expected sample size. We also empirically validate our theoretical results on several synthetic databases, showing that our algorithms also perform well in practice. Unlike previous work in private hypothesis testing that focused only on the classical fixed sample setting, our results in the sequential setting allow a conclusion to be reached much earlier, and thus saving the cost of collecting additional samples.