神经执行引擎：学习执行子例程

论文标题

神经执行引擎：学习执行子例程

Neural Execution Engines: Learning to Execute Subroutines

论文作者

Yan, Yujun, Swersky, Kevin, Koutra, Danai, Ranganathan, Parthasarathy, Hashemi, Milad

论文摘要

为训练复制算法推理的神经网络做出了巨大努力，但他们通常无法学习这些算法的抽象概念。他们无法将其推广到超出其受限制训练集的数据分布，即更大的输入和看不见的数据来证明这一点。我们研究了这些概括问题的数值子例程水平，包括分类，最短路径和最小跨越树的常见算法。首先，我们观察到，基于变压器的序列到序列模型可以学习子例程，例如对数字列表进行排序，但是随着列表的长度越长，它们的性能迅速降低，而不是训练集中的列表。我们证明，这是由于注意力权重以较长的序列失去保真度，尤其是当输入数在数值上相似时。为了解决这个问题，我们提出了一种有条理的条件掩盖机制，该机制使该模型能够以几乎完美的精度在各种算法上以近距离的精度远远超出其训练范围。其次，为了概括看不见的数据，我们表明，用二进制表示的编码数字会导致曾经在添加或乘法等下游任务进行训练的富含结构的嵌入。这使嵌入可以通过忠实地插值数字来处理丢失的数据。

A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms. This is evidenced by their inability to generalize to data distributions that are outside of their restricted training sets, namely larger inputs and unseen data. We study these generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees. First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. We demonstrate that this is due to attention weights that lose fidelity with longer sequences, particularly when the input numbers are numerically similar. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. Second, to generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication. This allows the embedding to handle missing data by faithfully interpolating numbers not seen during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题