变形损耗表面以影响优化器的行为

论文标题

变形损耗表面以影响优化器的行为

Deforming the Loss Surface to Affect the Behaviour of the Optimizer

论文作者

Chen, Liangming, Jin, Long, Du, Xiujuan, Li, Shuai, Liu, Mei

论文摘要

在深度学习中，通常假定优化过程是在形状固定损耗表面上进行的。不同的是，我们首先提出了本文中的变形映射的新颖概念，以影响优化器的行为。垂直变形映射（VDM）作为一种变形映射，可以使优化器进入平坦区域，这通常意味着更好的概括性能。此外，我们设计了各种VDM，并进一步为损失表面提供了贡献。定义局部M区域后，理论分析表明，变形损耗表面可以增强梯度下降优化器过滤尖锐的最小值的能力。通过可视化损失景观，我们评估了由原始优化器和VDM在CIFAR-100上增强的原始优化器获得的最小值的平坦度。实验结果表明，VDM确实找到了平坦的区域。此外，我们将VDM增强的流行卷积神经网络与ImageNet，CIFAR-10和CIFAR-100上的相应原始卷积神经网络进行了比较。结果令人惊讶：所有配备VDM的涉及模型都有显着改进。例如，CIFAR-100 RESNET-20的TOP-1测试准确性增加了1.46％，而额外的计算开销微不足道。

In deep learning, it is usually assumed that the optimization process is conducted on a shape-fixed loss surface. Differently, we first propose a novel concept of deformation mapping in this paper to affect the behaviour of the optimizer. Vertical deformation mapping (VDM), as a type of deformation mapping, can make the optimizer enter a flat region, which often implies better generalization performance. Moreover, we design various VDMs, and further provide their contributions to the loss surface. After defining the local M region, theoretical analyses show that deforming the loss surface can enhance the gradient descent optimizer's ability to filter out sharp minima. With visualizations of loss landscapes, we evaluate the flatnesses of minima obtained by both the original optimizer and optimizers enhanced by VDMs on CIFAR-100. The experimental results show that VDMs do find flatter regions. Moreover, we compare popular convolutional neural networks enhanced by VDMs with the corresponding original ones on ImageNet, CIFAR-10, and CIFAR-100. The results are surprising: there are significant improvements on all of the involved models equipped with VDMs. For example, the top-1 test accuracy of ResNet-20 on CIFAR-100 increases by 1.46%, with insignificant additional computational overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题