延迟差分神经网络

论文标题

延迟差分神经网络

Delay Differential Neural Networks

论文作者

Anumasa, Srinivas, Srijith, P. K.

论文摘要

神经普通微分方程（节点）将中间特征向量的计算视为由神经网络参数化的普通微分方程的轨迹。在本文中，我们提出了一个新型模型，即延迟差异神经网络（DDNN），灵感来自延迟微分方程（DDES）。提出的模型将隐藏特征向量的衍生物视为当前特征向量和过去特征向量（历史记录）的函数。该功能被建模为神经网络，因此，它导致了许多最近的重新系统变体的连续深度替代方案。我们提出了两种不同的DDNN架构，具体取决于考虑当前和过去特征向量的方式。对于培训DDNN，我们提供了一种计算梯度的内存效率伴随方法，并通过网络进行后传播。 DDNN通过进一步减少参数的数量而不影响概括性能来提高节点的数据效率。在合成和现实世界图像分类数据集（例如CIFAR10和CIFAR100）上进行的实验显示了所提出模型的有效性。

Neural ordinary differential equations (NODEs) treat computation of intermediate feature vectors as trajectories of ordinary differential equation parameterized by a neural network. In this paper, we propose a novel model, delay differential neural networks (DDNN), inspired by delay differential equations (DDEs). The proposed model considers the derivative of the hidden feature vector as a function of the current feature vector and past feature vectors (history). The function is modelled as a neural network and consequently, it leads to continuous depth alternatives to many recent ResNet variants. We propose two different DDNN architectures, depending on the way current and past feature vectors are considered. For training DDNNs, we provide a memory-efficient adjoint method for computing gradients and back-propagate through the network. DDNN improves the data efficiency of NODE by further reducing the number of parameters without affecting the generalization performance. Experiments conducted on synthetic and real-world image classification datasets such as Cifar10 and Cifar100 show the effectiveness of the proposed models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题