学习率扰动：一个通用的学习率计划插件，以平整本地最小值

论文标题

学习率扰动：一个通用的学习率计划插件，以平整本地最小值

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

论文作者

Liu, Hengyu, Fu, Qiang, Du, Lun, Zhang, Tiancheng, Yu, Ge, Han, Shi, Zhang, Dongmei

论文摘要

学习率是对神经网络培训有重大影响的最重要的超参数之一。学习率计划在实际实践中广泛使用，以根据预定义的时间表来调整学习率，以进行快速收敛和良好的概括。但是，现有的学习率时间表都是启发式算法，并且缺乏理论支持。因此，人们通常通过多个临时试验选择学习率计划，而所获得的学习率时间表是最佳的。为了提高所获得的次级学习率计划的性能，我们提出了一个通用学习率计划插件，称为学习率扰动（LEAP），可以将其应用于各种学习率计划，以通过向学习率引入一定的扰动来改善模型培训。我们发现，通过如此简单而有效的策略，培训处理成倍地利用了平坦的最小值，而不是具有保证的收敛性的锋利的最小值，从而提高了更好的概括能力。此外，我们进行了广泛的实验，表明使用LEAP培训可以使用各种学习率计划（包括恒定的学习率）来改善各种数据集对各种深度学习模型的性能。

Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training. Learning rate schedules are widely used in real practice to adjust the learning rate according to pre-defined schedules for fast convergence and good generalization. However, existing learning rate schedules are all heuristic algorithms and lack theoretical support. Therefore, people usually choose the learning rate schedules through multiple ad-hoc trials, and the obtained learning rate schedules are sub-optimal. To boost the performance of the obtained sub-optimal learning rate schedule, we propose a generic learning rate schedule plugin, called LEArning Rate Perturbation (LEAP), which can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate. We found that, with such a simple yet effective strategy, training processing exponentially favors flat minima rather than sharp minima with guaranteed convergence, which leads to better generalization ability. In addition, we conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets using various learning rate schedules (including constant learning rate).

下载PDF全文

下载文献需遵守相关版权规定

论文标题