论文标题
汽车两样本测试
AutoML Two-Sample Test
论文作者
论文摘要
两样本测试在统计和机器学习中很重要,既是科学发现的工具,又是检测分布变化的工具。这导致了许多复杂的测试程序的开发超出了标准监督学习框架,这些框架的用法可能需要有关两样本测试的专业知识。我们使用一个简单的测试,该测试将证人功能的平均差异作为测试统计数据,并证明将平方损失最小化会导致具有最佳测试能力的证人。这使我们能够利用汽车的最新进步。如果没有任何用户对当前问题的输入,并且在我们所有实验中使用相同的方法,我们的AutoML两样本测试可以在各种分配移动基准以及挑战两样本测试问题上实现竞争性能。 我们在Python软件包AUTOTST中提供了Automl两样本测试的实现。
Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst.