在混合正规化上

论文标题

在混合正规化上

On Mixup Regularization

论文作者

Carratino, Luigi, Cissé, Moustapha, Jenatton, Rodolphe, Vert, Jean-Philippe

论文摘要

混音是一种数据增强技术，可以创建新的示例作为培训点和标签的凸组合。这项简单的技术在经验上已证明可以提高不同设置和应用中许多最先进模型的准确性，但是这种经验成功背后的原因仍然很少理解。在本文中，我们通过阐明其正则化作用来解释混合的理论基础。我们表明，混合可以解释为标准的经验风险最小化估计量，但要与转换数据的随机扰动结合在一起。我们从这种新解释中获得了两个核心见解。首先，数据转换表明，在测试时间，一个经过混合训练的模型也应应用于转换的数据，这是代码的单行更改，我们在经验上表明，以提高预测的准确性和校准。其次，我们展示了混合新解释的随机扰动如何诱导多个已知的正则化方案，包括标签平滑和估计量Lipschitz常数的还原。这些方案相互协同互动，从而产生了自校准和有效的正则化效应，从而阻止了过度拟合和过度自信的预测。我们通过支持我们结论的实验来证实我们的理论分析。

Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels. This simple technique has empirically shown to improve the accuracy of many state-of-the-art models in different settings and applications, but the reasons behind this empirical success remain poorly understood. In this paper we take a substantial step in explaining the theoretical foundations of Mixup, by clarifying its regularization effects. We show that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data. We gain two core insights from this new interpretation. First, the data transformation suggests that, at test time, a model trained with Mixup should also be applied to transformed data, a one-line change in code that we show empirically to improve both accuracy and calibration of the prediction. Second, we show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes, including label smoothing and reduction of the Lipschitz constant of the estimator. These schemes interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. We corroborate our theoretical analysis with experiments that support our conclusions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题