用贝叶斯优化自动调整随机梯度下降

论文标题

用贝叶斯优化自动调整随机梯度下降

Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

论文作者

Picheny, Victor, Dutordoir, Vincent, Artemev, Artem, Durrande, Nicolas

论文摘要

许多机器学习模型需要基于运行随机梯度下降的培训程序。这些算法效率的关键要素是学习率计划的选择。虽然几位作者已经解决了使用贝叶斯优化的良好学习率时间表，但以数据驱动的方式进行动态调整是一个悬而未决的问题。对于需要培训单个昂贵型号的用户而言，这至关重要。为了解决这个问题，我们基于潜在的高斯流程和自动/回归公式引入了一种原始的概率模型，以适应优化器的痕迹，该模型灵活地调整了由新学习率值引起的行为的突然变化。如图所示，该模型非常适合解决一组问题：首先，对于冷启动运行的学习率在线改编；然后，为了调整一组类似任务的时间表（在经典的BO设置中），并为新任务进行热身启动。

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题