论文标题
复发性神经网络的增量训练,利用多尺度动态记忆
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory
论文作者
论文摘要
复发性神经网络的有效性可能在很大程度上受到其将其存储到不同频率和时标以从输入序列中提取的动态记忆信息的能力的影响。通过适当的动态内存模块化,可以将这种功能引入神经体系结构中。在本文中,我们提出了一种新颖的经训练的经过培训的复发体系结构,以明确的多尺度学习为目标。首先,我们通过将其隐藏状态分离为不同的模块,将每个模块分开,每个频率以不同频率的网络隐藏激活子进行分类来扩展简单RNN的体系结构。然后,我们讨论一种培训算法,在该算法中,将新模块迭代地添加到模型中,以逐步学习更长的依赖项。每个新模块的工作频率都比以前的模块慢,并且必须初始化以编码隐藏激活的亚采样序列。关于语音识别和手写字符的合成和现实数据集的实验结果表明,模块化体系结构和增量训练算法提高了经常性神经网络捕获长期依赖性的能力。
The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. First, we show how to extend the architecture of a simple RNN by separating its hidden state into different modules, each subsampling the network hidden activations at different frequencies. Then, we discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies. Each new module works at a slower frequency than the previous ones and it is initialized to encode the subsampled sequence of hidden activations. Experimental results on synthetic and real-world datasets on speech recognition and handwritten characters show that the modular architecture and the incremental training algorithm improve the ability of recurrent neural networks to capture long-term dependencies.