对自适应梯度算法动态行为的定性研究

论文标题

对自适应梯度算法动态行为的定性研究

A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms

论文作者

Ma, Chao, Wu, Lei, E, Weinan

论文摘要

通过仔细的数值实验和理论解释的结合，研究了RMSPROP和ADAM算法的动态行为。在训练损失曲线中观察到三种类型的定性特征：在后期快速初始收敛，振荡和较大的尖峰。符号梯度下降（SIGNGD）流，这是ADAM在保持动量参数时将学习率达到0时的极限，用于解释快速初始收敛。对于亚当的后期，观察到三种不同类型的定性模式，具体取决于超参数的选择：振荡，尖峰和发散。特别是，当两个动量因子的值彼此接近时，亚当会收敛得更加顺畅，更快。该观察结果对于科学计算任务尤为重要，训练过程通常会进入高精度制度。

The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations. Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations, and large spikes in the late phase. The sign gradient descent (signGD) flow, which is the limit of Adam when taking the learning rate to 0 while keeping the momentum parameters fixed, is used to explain the fast initial convergence. For the late phase of Adam, three different types of qualitative patterns are observed depending on the choice of the hyper-parameters: oscillations, spikes, and divergence. In particular, Adam converges much smoother and even faster when the values of the two momentum factors are close to each other. This observation is particularly important for scientific computing tasks, for which the training process usually proceeds into the high precision regime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题