论文标题
通过增强噪声改善概括
Improved generalization by noise enhancement
论文作者
论文摘要
最近的研究表明,随机梯度下降(SGD)的噪声与概括密切相关:较大的SGD噪声(即使不是太大)会导致更好的概括。由于SGD噪声的协方差与$η^2/b $成正比,其中$η$是学习率,而$ b $是SGD的MiniBatch大小,因此迄今为止,SGD噪声已通过更改$η$和/或$ b $来控制。但是,太大的$η$会导致训练动力学的不稳定,而小$ b $则阻止了可扩展的并行计算。因此,希望开发一种控制SGD噪声的方法,而无需更改$η$和$ b $。在本文中,我们提出了一种使用``噪声增强''实现此目标的方法,该方法在实践中很容易实现。我们阐述了基本的理论思想,并证明了噪声增强实际上可以改善实际数据集的概括。事实证明,与小批量训练相比,具有噪音增强的大批量训练甚至显示出更好的概括。
Recent studies have demonstrated that noise in stochastic gradient descent (SGD) is closely related to generalization: A larger SGD noise, if not too large, results in better generalization. Since the covariance of the SGD noise is proportional to $η^2/B$, where $η$ is the learning rate and $B$ is the minibatch size of SGD, the SGD noise has so far been controlled by changing $η$ and/or $B$. However, too large $η$ results in instability in the training dynamics and a small $B$ prevents scalable parallel computation. It is thus desirable to develop a method of controlling the SGD noise without changing $η$ and $B$. In this paper, we propose a method that achieves this goal using ``noise enhancement'', which is easily implemented in practice. We expound the underlying theoretical idea and demonstrate that the noise enhancement actually improves generalization for real datasets. It turns out that large-batch training with the noise enhancement even shows better generalization compared with small-batch training.