关键的点调查方法揭示了深网损失的梯度空间区域

论文标题

关键的点调查方法揭示了深网损失的梯度空间区域

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

论文作者

Frye, Charles G., Simon, James, Wadia, Neha S., Ligeralde, Andrew, DeWeese, Michael R., Bouchard, Kristofer E.

论文摘要

尽管深度神经网络的损失函数是高度非凸的，但基于梯度的优化算法会收敛到从许多随机初始点中的大致相同的性能。一项工作的重点是通过表征损失函数临界点附近的局部曲率来解释这一现象，其中梯度接近零，并证明神经网络损失具有不可测，没有BAD-LAD-LOCAL EMINIMA的特性和大量的鞍点。我们在这里报告说，用于发现这些假定的关键点的方法遇到了他们自己的不良局部最小问题：它们经常汇聚到或穿过梯度规范具有固定点的区域。我们称这些梯度流动区域为当梯度大约在Hessian的核中时会出现，因此损失在梯度方向上是局部近似线性或平坦的。我们描述了这些区域的存在如何在解释过去的结果中需要注意，这些结果声称可以找到神经网络损失的关键点以及设计以优化神经网络的二阶方法。

Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstrating that neural network losses enjoy a no-bad-local-minima property and an abundance of saddle points. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题