任务，稳定，建筑和计算：培训更有效的学习优化者，并使用它们来训练自己

论文标题

任务，稳定，建筑和计算：培训更有效的学习优化者，并使用它们来训练自己

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

论文作者

Metz, Luke, Maheswaranathan, Niru, Freeman, C. Daniel, Poole, Ben, Sohl-Dickstein, Jascha

论文摘要

随着通过学习的功能替换手工设计的功能已经彻底改变了我们如何解决感知任务，我们认为学习算法将改变我们训练模型的方式。在这项工作中，我们专注于通用学习的优化器，能够训练各种问题，而没有用户指定的超参数。我们引入了一个新的神经网络参数化的，分层优化器，并访问其他功能，例如验证损失以启用自动正则化。大多数学识渊博的优化器仅接受了一项任务或少量任务的培训。我们在数千个任务上训练优化器，从而使用数量级的订单更加计算，从而导致优化器可以更好地概括地看不见任务。学识渊博的优化器不仅表现良好，还可以学习与现有一阶优化器不同的行为。例如，它们生成具有隐式正则化并适应问题超参数（例如批处理大小）或体系结构（例如神经网络宽度）的更新步骤。最后，这些学识渊博的优化器显示出对超出分销任务（例如从头开始训练自己的分发任务）有用的证据。

Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题