非线性神经网络训练的不适性和优化几何形状

论文标题

非线性神经网络训练的不适性和优化几何形状

Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training

论文作者

O'Leary-Roseberry, Thomas, Ghattas, Omar

论文摘要

在这项工作中，我们分析了非线性激活功能在密集神经网络训练问题的固定点上起的作用。我们考虑通用最小二乘损失函数训练公式。我们表明，网络构造中使用的非线性激活功能在分类损失景观的固定点方面起着至关重要的作用。我们表明，对于浅层致密的网络，非线性激活函数决定了全球最小值附近的Hessian Nullspace（如果存在的话），因此决定了训练问题的不良性。此外，对于浅非线性网络，我们表明激活函数及其衍生物的零可能导致杂种局部最小值，并讨论严格的鞍点的条件。我们将这些结果扩展到深密集的神经网络，表明最后一个激活函数在对固定点进行分类中起着重要作用，因为它是如何从链条规则中显示出来的。

In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.

下载PDF全文

下载文献需遵守相关版权规定

论文标题