论文标题
了解和检测动量的随机梯度下降的收敛性
Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum
论文作者
论文摘要
迭代随机优化方法的收敛检测具有极大的实际兴趣。本文认为具有恒定学习率和动量的随机梯度下降(SGD)。我们表明,存在一个瞬态阶段,在该阶段中,迭代朝向感兴趣的区域的移动,而静态阶段的迭代阶段在最小点附近保持在该区域的边界。我们使用连续梯度之间的内部产物构建了统计诊断测试,以将其收敛到固定期,并证明所提出的诊断效果很好。我们从理论上和经验上表征动量如何影响诊断的测试统计量,以及测试统计量如何捕获收敛梯度内相对稀疏的信号。最后,我们通过降低每个时间平稳性来自动调整学习率的应用程序,并证明该过程对于错误指定的初始率是可靠的。
Convergence detection of iterative stochastic optimization methods is of great practical interest. This paper considers stochastic gradient descent (SGD) with a constant learning rate and momentum. We show that there exists a transient phase in which iterates move towards a region of interest, and a stationary phase in which iterates remain bounded in that region around a minimum point. We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients and demonstrate that the proposed diagnostic works well. We theoretically and empirically characterize how momentum can affect the test statistic of the diagnostic, and how the test statistic captures a relatively sparse signal within the gradients in convergence. Finally, we demonstrate an application to automatically tune the learning rate by reducing it each time stationarity is detected, and show the procedure is robust to mis-specified initial rates.