使用复发网络识别长语法序列，并用外部可区分堆栈增强

论文标题

使用复发网络识别长语法序列，并用外部可区分堆栈增强

Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack

论文作者

Mali, Ankur, Ororbia, Alexander, Kifer, Daniel, Giles, Clyde Lee

论文摘要

复发性神经网络（RNN）是一种用于序列建模，生成和预测的广泛使用的深层体系结构。尽管在机器翻译和语音识别等应用方面取得了成功，但这些状态模型仍存在一些关键的缺点。具体而言，RNN在很长的序列上概括较差，这将其适用性限制在许多重要的时间处理和时间序列预测问题上。例如，RNN努力识别复杂的上下文自由语言（CFL），从未达到100％的培训准确性。解决这些缺点的一种方法是将RNN与外部，可区分的内存结构（例如堆栈）搭配。但是，先前工作中的可区分记忆既没有在CFL上进行广泛研究，也没有对序列进行测试的时间比训练中的序列更长。研究它们的少数努力表明，连续可区分的记忆结构对复杂的CFL产生了较差的概括，从而使RNN降低了。在本文中，我们通过重要的体系结构和状态更新机制改进了内存的RNN，以确保模型学会正确地平衡其潜在状态使用外部内存。我们改进的RNN模型表现出更好的泛化性能，并能够对复杂层次无上下文的语法（CFG）产生的长字符串进行分类。我们在CGG上评估了我们的模型，包括Dyck语言以及Penn Treebank语言建模任务，并在这些基准中实现稳定，稳健的性能。此外，我们表明，只有我们的内存仪器网络才能保留内存更长的持续时间，直到长度为160的字符串。

Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), never reaching 100% accuracy on training. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on CFLs nor tested on sequences longer than those seen in training. The few efforts that have studied them have shown that continuous differentiable memory structures yield poor generalization for complex CFLs, making the RNN less interpretable. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms that ensure that the model learns to properly balance the use of its latent states with external memory. Our improved RNN models exhibit better generalization performance and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CGGs, including the Dyck languages, as well as on the Penn Treebank language modelling task, and achieve stable, robust performance across these benchmarks. Furthermore, we show that only our memory-augmented networks are capable of retaining memory for a longer duration up to strings of length 160.

下载PDF全文

下载文献需遵守相关版权规定

论文标题