论文标题
球形运动动力学:具有归一化,重量衰减和SGD的神经网络的学习动力学
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
论文作者
论文摘要
在这项工作中,我们全面揭示了具有标准化,重量衰减(WD)和SGD(具有动量)的神经网络的学习动力学,称为球形运动动力学(SMD)。大多数相关工作通过关注“平衡”条件下的“有效学习率”来研究SMD,而体重规范保持不变。但是,他们关于为什么在SMD中达到平衡条件的讨论是不存在或更令人信服的。我们的工作通过直接探索均衡状况的原因来研究SMD。具体而言,1)我们介绍了可能导致SMD中平衡条件的假设,并证明重量标准可以通过给定假设以线性速率收敛; 2)我们提出“角更新”作为测量SMD中神经网络不断发展的有效学习率的替代品,并且证明角更新也可以以线性速率融合其理论值; 3)我们在包括标准设置的各种计算机视觉任务(包括ImageNet和Mscoco)上验证了我们的假设和理论结果。实验结果表明,我们的理论发现与经验观察非常吻合。
In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective learning rate" in "equilibrium" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD is either absent or less convincing. Our work investigates SMD by directly exploring the cause of equilibrium condition. Specifically, 1) we introduce the assumptions that can lead to equilibrium condition in SMD, and prove that weight norm can converge at linear rate with given assumptions; 2) we propose "angular update" as a substitute for effective learning rate to measure the evolving of neural network in SMD, and prove angular update can also converge to its theoretical value at linear rate; 3) we verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.