论文标题
终身学习的神经网络的深度增加
Increasing Depth of Neural Networks for Life-long Learning
论文作者
论文摘要
目的:我们提出了一种基于神经网络深度的不断学习的新方法。这项工作探讨了扩展神经网络深度是否在终身学习环境中可能是有益的。 方法:我们提出了一种基于在现有的方法上添加新层的新方法,以使知识的前瞻性转移和改编以前学习的表示形式。我们采用一种确定最类似任务的方法,以选择网络中最佳位置,以添加具有可训练参数的新节点。这种方法允许创建一个类似树的模型,其中每个节点是专门针对特定任务的一组神经网络参数。渐进的神经网络概念激发了提出的方法。因此,它受益于网络结构的动态变化。但是,渐进的神经网络在学习过程中为整个网络结构分配了许多内存。提出的方法通过仅添加网络的一部分来完成一项新任务并利用先前训练的权重来减轻这一点。同时,我们可以保留PNN的好处,例如不需要设计保证的而无需内存缓冲区。 结果:分裂CIFAR和Split Tiny Imagenet上的实验表明,所提出的算法与其他持续学习方法相当。在以单个计算机视觉数据集为单独的任务的更具挑战性的设置中,我们的方法优于重播。 结论:它与常用的计算机视觉架构兼容,不需要自定义的网络结构。通过扩展体系结构来改编更改数据分布,因此无需使用彩排缓冲区。因此,我们的方法可用于必须考虑必须考虑数据隐私的敏感应用程序。
Purpose: We propose a novel method for continual learning based on the increasing depth of neural networks. This work explores whether extending neural network depth may be beneficial in a life-long learning setting. Methods: We propose a novel approach based on adding new layers on top of existing ones to enable the forward transfer of knowledge and adapting previously learned representations. We employ a method of determining the most similar tasks for selecting the best location in our network to add new nodes with trainable parameters. This approach allows for creating a tree-like model, where each node is a set of neural network parameters dedicated to a specific task. The Progressive Neural Network concept inspires the proposed method. Therefore, it benefits from dynamic changes in network structure. However, Progressive Neural Network allocates a lot of memory for the whole network structure during the learning process. The proposed method alleviates this by adding only part of a network for a new task and utilizing a subset of previously trained weights. At the same time, we may retain the benefit of PNN, such as no forgetting guaranteed by design, without needing a memory buffer. Results: Experiments on Split CIFAR and Split Tiny ImageNet show that the proposed algorithm is on par with other continual learning methods. In a more challenging setup with a single computer vision dataset as a separate task, our method outperforms Experience Replay. Conclusion: It is compatible with commonly used computer vision architectures and does not require a custom network structure. As an adaptation to changing data distribution is made by expanding the architecture, there is no need to utilize a rehearsal buffer. For this reason, our method could be used for sensitive applications where data privacy must be considered.