端到端算法综合与复发网络：逻辑外推没有过度思考

论文标题

端到端算法综合与复发网络：逻辑外推没有过度思考

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

论文作者

Bansal, Arpit, Schwarzschild, Avi, Borgnia, Eitan, Emam, Zeyad, Huang, Furong, Goldblum, Micah, Goldstein, Tom

论文摘要

机器学习系统在模式匹配任务上表现良好，但是它们执行算法或逻辑推理的能力尚不清楚。一个重要的推理能力是算法外推，其中仅在小/简单的推理问题上训练的模型可以在测试时为大/复杂问题综合复杂策略。可以通过复发系统来实现算法外推，这可以迭代多次解决困难的推理问题。我们观察到，这种方法无法扩展到高度复杂的问题，因为当应用许多迭代时，行为会退化 - 我们称为“过度思考”的问题。我们提出了一个召回体系结构，该体系结构将问题实例的明确副本保留在内存中，以免被遗忘。我们还采用了渐进培训程序，该程序可以防止模型从特定于迭代的学习行为中，而将其推向学习行为，可以无限期地重复。这些创新阻止了过度思考的问题，并使经常性系统能够解决极其艰难的外推任务。

Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard extrapolation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题