论文标题

广泛神经网络中SGD混乱的定量传播

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

论文作者

De Bortoli, Valentin, Durmus, Alain, Fontaine, Xavier, Simsekli, Umut

论文摘要

在本文中,我们研究了应用于两层过度参数过度参数化的神经网络的随机梯度下降(SGD)算法的连续时间对应物的限制行为,作为数量或神经元(即隐藏层的大小)$ n \ to +fto +\ fy hyfty $。遵循概率方法,我们显示了在不同情况下由这种连续时间动力学定义的粒子系统的“混乱传播”,表明粒子之间的统计相互作用渐近消失。特别是,我们在与Wasserstein距离的公制空间中建立了任何粒子的$ n $相对于任何粒子的$ n $的定量收敛。与以前的有关该主题的工作相比,我们考虑了设置,其中SGD中的步骤序列可能取决于神经元和迭代的数量。然后,我们确定了获得不同平均场限制的两个制度,其中一个与当前最小化问题的隐式正规化版本相对应。我们在实际数据集上执行各种实验,以验证我们的理论结果,评估这两个方案在分类问题上的存在并说明我们的收敛结果。

In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源