仅梯度线搜索以自动确定各种随机训练算法的学习率

论文标题

仅梯度线搜索以自动确定各种随机训练算法的学习率

Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms

论文作者

Kafka, Dominic, Wilke, Daniel Nicolas

论文摘要

最近，仅梯度和概率线搜索重新引入了在动态小批次亚采样神经网络训练中适应学习率的能力。但是，随机线搜索仍处于起步阶段，因此需要进行持续的调查。我们研究了不精确（GOLS-I）的仅梯度线路搜索的应用，以自动确定选择流行的神经网络培训算法的学习率计划，包括NAG，Adagrad，Adagrad，Adadelta，Adam和LBFGS，以及许多浅水，深层和繁殖力的神经网络培训，这些网络培训了具有差异损失功能的许多浅水，深层和卷积的神经网络。我们发现GOLS-I的学习率时间表具有手动调整的学习率，超过七种优化算法，三种类型的神经网络体系结构，23个数据集和两个损失功能的竞争力。我们证明，包括主要动量特征的算法不适合与GOLS-I一起使用。但是，对于大多数流行的神经网络训练算法，GOLS-I有效地确定了15个数量级以上的学习率时间表，从而有效地消除了在神经网络培训中调整敏感的学习速率时间表的敏感超参数。

Gradient-only and probabilistic line searches have recently reintroduced the ability to adaptively determine learning rates in dynamic mini-batch sub-sampled neural network training. However, stochastic line searches are still in their infancy and thus call for an ongoing investigation. We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to automatically determine the learning rate schedule for a selection of popular neural network training algorithms, including NAG, Adagrad, Adadelta, Adam and LBFGS, with numerous shallow, deep and convolutional neural network architectures trained on different datasets with various loss functions. We find that GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions. We demonstrate that algorithms, which include dominant momentum characteristics, are not well suited to be used with GOLS-I. However, we find GOLS-I to be effective in automatically determining learning rate schedules over 15 orders of magnitude, for most popular neural network training algorithms, effectively removing the need to tune the sensitive hyperparameters of learning rate schedules in neural network training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题