论文标题
“嘿,那不是颂歌”:通过eminorms更快
"Hey, that's not an ODE": Faster ODE Adjoints via Seminorms
论文作者
论文摘要
神经微分方程可以通过通过伴随方法向反向传播梯度进行训练,这是另一个差分方程,通常使用自适应尺度大小的数值微分方程求解器求解。如果其错误\ emph {相对于某些规范}的误差足够小,则可以接受提出的步骤;否则它被拒绝,步骤是缩小的,并且重复了过程。在这里,我们证明了伴随方程的特定结构使常规选择(例如$ l^2 $)不必要地严格。通过用更合适的(半)规范替换它,不必要地拒绝较少的步骤,并且使反向传播更快。这仅需要次要的代码修改。在各种任务(包括时间序列,生成建模和物理控制)上进行的实验表明,功能评估的中位数改善了40%。在某些问题上,我们看到功能评估少62%,因此整体训练时间大约减半。
Neural differential equations may be trained by backpropagating gradients via the adjoint method, which is another differential equation typically solved using an adaptive-step-size numerical differential equation solver. A proposed step is accepted if its error, \emph{relative to some norm}, is sufficiently small; else it is rejected, the step is shrunk, and the process is repeated. Here, we demonstrate that the particular structure of the adjoint equations makes the usual choices of norm (such as $L^2$) unnecessarily stringent. By replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily rejected and the backpropagation is made faster. This requires only minor code modifications. Experiments on a wide range of tasks -- including time series, generative modeling, and physical control -- demonstrate a median improvement of 40% fewer function evaluations. On some problems we see as much as 62% fewer function evaluations, so that the overall training time is roughly halved.