论文标题
神经网络的一般周期性培训
General Cyclical Training of Neural Networks
论文作者
论文摘要
本文介绍了机器学习中“一般周期性培训”的原则,其中培训以“易于训练”开始和结尾,而“硬训练”发生在中间时期。我们提出了几种训练神经网络的表现,包括算法示例(通过超参数和损失功能),基于数据的示例和基于模型的示例。具体而言,我们介绍了几种新型技术:周期性重量衰减,周期性批量尺寸,周期性局灶性损失,周期性软度温度,周期性数据增强,周期性梯度剪辑以及周期性的半监督学习。此外,我们证明了周期性重量衰减,周期性软磁性温度和周期性梯度剪辑(作为该原理的三个示例)对训练有素的模型的测试准确性性能有益。此外,我们从一般周期性培训的角度讨论了基于模型的示例(例如预处理和知识蒸馏),并建议对典型培训方法进行一些更改。总而言之,本文定义了一般的周期性培训概念,并讨论了该概念可以应用于训练神经网络的几种特定方式。本着可重复性的精神,我们的实验中使用的代码可在\ url {https://github.com/lnsmith54/cfl}上获得。
This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such as pretraining and knowledge distillation) from the perspective of general cyclical training and recommend some changes to the typical training methodology. In summary, this paper defines the general cyclical training concept and discusses several specific ways in which this concept can be applied to training neural networks. In the spirit of reproducibility, the code used in our experiments is available at \url{https://github.com/lnsmith54/CFL}.