论文标题
机器学习的辍学训练在分配上是强大的最佳
Machine Learning's Dropout Training is Distributionally Robust Optimal
论文作者
论文摘要
本文表明,广义线性模型中的辍学训练是两个玩家的零和零游戏的最小解决方案,其中对手性质使用综合性非参数错误模型损坏了统计学家的协变量。在这个游戏中,自然最不利的分布是辍学的噪声,其中大自然以一些固定的概率$δ$独立删除协变量向量的条目。该结果意味着,辍学训练确实提供了样本外的预期损失保证,用于由样本中数据的多重扰动产生的分布。除了决策理论分析外,本文还提供了两项贡献。首先,有关于如何选择调整参数$δ$的具体建议,以确保随着样本量的增长,辍学训练后的样本内损失超过了真正的人口损失,并具有一些预先指定的概率。其次,本文提供了一种新颖的,可行的,无偏的多层次蒙特卡洛算法,以加快辍学训练的实现。与辍学的幼稚实现相比,我们的算法的计算成本要小得多,前提是数据点的数量比协变量向量的维度小得多。
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game, nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability $δ$. This result implies that dropout training indeed provides out-of-sample expected loss guarantees for distributions that arise from multiplicative perturbations of in-sample data. In addition to the decision-theoretic analysis, the paper makes two more contributions. First, there is a concrete recommendation on how to select the tuning parameter $δ$ to guarantee that, as the sample size grows large, the in-sample loss after dropout training exceeds the true population loss with some pre-specified probability. Second, the paper provides a novel, parallelizable, Unbiased Multi-Level Monte Carlo algorithm to speed-up the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout, provided the number of data points is much smaller than the dimension of the covariate vector.