DDPNOPT：差分动态编程神经优化器

论文标题

DDPNOPT：差分动态编程神经优化器

DDPNOpt: Differential Dynamic Programming Neural Optimizer

论文作者

Liu, Guan-Horng, Chen, Tianrong, Theodorou, Evangelos A.

论文摘要

最近，深层神经网络（DNNS）培训是非线性动力学系统的最佳控制问题，最近受到了相当大的关注，但是算法开发仍然相对有限。在这项工作中，我们通过从轨迹优化的角度重新设计训练程序来尝试沿着这一行。我们首先表明，用于训练DNN的最广泛使用的算法可以与差分动态编程（DDP）链接，这是一种扎根于近似动态编程的著名二阶方法。在这种情况下，我们提出了一类新的优化器DDP神经优化器（DDPNOPT），用于培训前馈和卷积网络。 DDPNOPT具有层面反馈策略，可改善收敛性并降低对现有方法的超参数的敏感性。它表现优于其他最佳控制启发的培训方法，既有收敛性和复杂性，并且在最新的第一阶和二阶方法中都具有竞争力。我们还观察到DDPNOPT在防止梯度消失方面具有惊人的好处。我们的工作为基于最佳控制理论的原则算法设计开辟了新的途径。

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order method rooted in the Approximate Dynamic Programming. In this vein, we propose a new class of optimizer, DDP Neural Optimizer (DDPNOpt), for training feedforward and convolution networks. DDPNOpt features layer-wise feedback policies which improve convergence and reduce sensitivity to hyper-parameter over existing methods. It outperforms other optimal-control inspired training methods in both convergence and complexity, and is competitive against state-of-the-art first and second order methods. We also observe DDPNOpt has surprising benefit in preventing gradient vanishing. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题