基于高级伴随分化的记忆有效的神经颂歌框架

论文标题

基于高级伴随分化的记忆有效的神经颂歌框架

A memory-efficient neural ODE framework based on high-level adjoint differentiation

论文作者

Zhang, Hong, Zhao, Wenjun

论文摘要

神经普通微分方程（神经ODE）已成为一种新型的网络体系结构，它弥合了动态系统和深度学习。但是，在香草神经ode中使用连续伴随法获得的梯度并不是反对的。其他方法由于深度计算图而遭受了过度的内存需求，或者由于时间集成方案的有限选择而妨碍了它们对大规模复杂动力学系统的应用。为了在不损害内存效率和灵活性的情况下实现准确的梯度，我们基于高级离散的伴奏算法分化，提出了一个新的神经ode框架Pnode。通过利用为这些集成器量身定制的离散伴随时间集成器和高级检查点策略，Pnode可以在内存和计算成本之间提供平衡，同时始终如一地计算梯度。我们提供基于Pytorch和Petsc的开源实现，这是最常用的便携式，可扩展的科学计算库之一。我们通过对图像分类和连续规范流量问题进行广泛的数值实验来演示性能。我们表明，与其他反向精确方法相比，PNODE可实现最高的存储效率。在图像分类问题上，Pnode的最大两倍是香草神经颂的速度，并且比最好的现有反向准确方法快2.3倍。我们还表明，Pnode可以使用僵硬的动力学系统所需的隐式时间积分方法。

Neural ordinary differential equations (neural ODEs) have emerged as a novel network architecture that bridges dynamical systems and deep learning. However, the gradient obtained with the continuous adjoint method in the vanilla neural ODE is not reverse-accurate. Other approaches suffer either from an excessive memory requirement due to deep computational graphs or from limited choices for the time integration scheme, hampering their application to large-scale complex dynamical systems. To achieve accurate gradients without compromising memory efficiency and flexibility, we present a new neural ODE framework, PNODE, based on high-level discrete adjoint algorithmic differentiation. By leveraging discrete adjoint time integrators and advanced checkpointing strategies tailored for these integrators, PNODE can provide a balance between memory and computational costs, while computing the gradients consistently and accurately. We provide an open-source implementation based on PyTorch and PETSc, one of the most commonly used portable, scalable scientific computing libraries. We demonstrate the performance through extensive numerical experiments on image classification and continuous normalizing flow problems. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods. On the image classification problems, PNODE is up to two times faster than the vanilla neural ODE and up to 2.3 times faster than the best existing reverse-accurate method. We also show that PNODE enables the use of the implicit time integration methods that are needed for stiff dynamical systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题