逆向工程学的优化器揭示了已知和新颖的机制

论文标题

逆向工程学的优化器揭示了已知和新颖的机制

Reverse engineering learned optimizers reveals known and novel mechanisms

论文作者

Maheswaranathan, Niru, Sussillo, David, Metz, Luke, Sun, Ruoxi, Sohl-Dickstein, Jascha

论文摘要

学到的优化器是可以训练以解决优化问题的算法。与使用从理论原理得出的简单更新规则的基线优化器（例如动量或亚当）相反，学到的优化者使用灵活的，高维的，非线性的参数化。尽管这可以在某些情况下导致更好的性能，但它们的内部运作仍然是一个谜。学识渊博的优化器能够胜过良好的基线？它是否学会了现有优化技术的复杂组合，还是实施了全新的行为？在这项工作中，我们通过仔细的分析和可视化学习的优化器来解决这些问题。我们研究了从头开始训练的三个不同任务的学习优化者，并发现他们已经学习了可解释的机制，包括：动量，梯度削减，学习率时间表以及一种新的学习率适应方式。此外，我们展示了学习优化者的动态如何实现这些行为。我们的结果有助于阐明对学到的优化者如何工作的模糊理解，并建立解释未来学习优化者的工具。

Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to better performance in certain settings, their inner workings remain a mystery. How is a learned optimizer able to outperform a well tuned baseline? Has it learned a sophisticated combination of existing optimization techniques, or is it implementing completely new behavior? In this work, we address these questions by careful analysis and visualization of learned optimizers. We study learned optimizers trained from scratch on three disparate tasks, and discover that they have learned interpretable mechanisms, including: momentum, gradient clipping, learning rate schedules, and a new form of learning rate adaptation. Moreover, we show how the dynamics of learned optimizers enables these behaviors. Our results help elucidate the previously murky understanding of how learned optimizers work, and establish tools for interpreting future learned optimizers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题