论文标题
体重空间中的共鸣:协变量偏移可以使SGD的发散与动量驱动
Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum
论文作者
论文摘要
大多数收敛保证了动量(SGDM)的随机梯度下降依赖于IID采样。然而,在具有时间相关的输入样本(例如持续学习和强化学习)的设置中,SGDM经常在此制度之外使用。现有的工作表明,具有衰减的阶梯尺寸的SGDM可以在马尔可夫时间相关性下收敛。在这项工作中,我们表明,具有固定台阶的协变量偏移下的SGDM可能不稳定且分歧。特别是,我们在协变量偏移下显示SGDM是一个参数振荡器,因此可能会遭受一种称为共振的现象。我们将学习系统近似为普通微分方程的时间变化系统,并利用现有理论将系统的差异/收敛性表征为共振/非谐振模式。理论上的结果仅限于带有周期性协变量转移的线性设置,因此我们从经验上补充了这种结果,即使在非周期性协变量转移,具有神经网络的非线性动力学以及其他优化器以外的其他优化器下,共振现象仍然存在。
Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge. In particular, we show SGDm under covariate shift is a parametric oscillator, and so can suffer from a phenomenon known as resonance. We approximate the learning system as a time varying system of ordinary differential equations, and leverage existing theory to characterize the system's divergence/convergence as resonant/nonresonant modes. The theoretical result is limited to the linear setting with periodic covariate shift, so we empirically supplement this result to show that resonance phenomena persist even under non-periodic covariate shift, nonlinear dynamics with neural networks, and optimizers other than SGDm.