保证使用学习对学习方法来调整步长大小

论文标题

保证使用学习对学习方法来调整步长大小

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

论文作者

Wang, Xiang, Yuan, Shuai, Wu, Chenwei, Ge, Rong

论文摘要

选择合适的参数进行优化算法通常是其在实践中成功的关键。使用学习对学习方法解决此问题 - 基于优化器生成的轨迹上的元观点上的元梯度下降 - 最近被证明是有效的。但是，元优化问题很困难。特别是，元梯度通常会爆炸/消失，如果不仔细选择元观察者，学到的优化器可能不会具有良好的概括性性能。在本文中，我们为学习对学习方法提供了元优化保证，以调整二次损失的步骤大小的简单问题。我们的结果表明，幼稚的目标遭受了元成分爆炸/消失的问题。尽管有一种设计元目标的方法，以便元梯度保持多项式界限，从而直接使用反向传播来计算元梯度会导致数值问题。我们还表征何时有必要在单独的验证集中计算元观察者，以确保学习优化器的概括性能。最后，我们从经验上验证了我们的结果，并表明即使是通过神经网络参数为参数的更复杂的学习优化者，也出现了类似的现象。

Choosing the right parameters for optimization algorithms is often the key to their success in practice. Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective. However, the meta-optimization problem is difficult. In particular, the meta-gradient can often explode/vanish, and the learned optimizer may not have good generalization performance if the meta-objective is not chosen carefully. In this paper we give meta-optimization guarantees for the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that the naïve objective suffers from meta-gradient explosion/vanishing problem. Although there is a way to design the meta-objective so that the meta-gradient remains polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues. We also characterize when it is necessary to compute the meta-objective on a separate validation set to ensure the generalization performance of the learned optimizer. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题