通过噪声注入过度隔离的模型中的显式正则化

论文标题

通过噪声注入过度隔离的模型中的显式正则化

Explicit Regularization in Overparametrized Models via Noise Injection

论文作者

Orvieto, Antonio, Raj, Anant, Kersting, Hans, Bach, Francis

论文摘要

在梯度下降中注入噪声具有几个理想的特征，例如平滑和正规化特性。在本文中，我们研究了在计算梯度步骤之前注入噪声的效果。我们证明，基于L1-Norm，L1-Norms或核规范的简单模型的小型扰动可以诱导明确的正则化。但是，当应用于具有较大宽度的过多散热性神经网络时，我们表明相同的扰动会导致方差爆炸。为了克服这一点，我们建议使用独立的层扰动，事实证明，该扰动可以在没有方差爆炸的情况下进行明确的正则化。我们的经验结果表明，与香草梯度下降相比，这些小的扰动导致概括性能的提高。

Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small perturbations can induce explicit regularization for simple models based on the L1-norm, group L1-norms, or nuclear norms. However, when applied to overparametrized neural networks with large widths, we show that the same perturbations can cause variance explosion. To overcome this, we propose using independent layer-wise perturbations, which provably allow for explicit regularization without variance explosion. Our empirical results show that these small perturbations lead to improved generalization performance compared to vanilla gradient descent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题