论文标题

在培训人工神经网络中的归一化梯度流动优化

Normalized gradient flow optimization in the training of ReLU artificial neural networks

论文作者

Eberle, Simon, Jentzen, Arnulf, Riekert, Adrian, Weiss, Georg

论文摘要

如今,对人工神经网络(ANN)的培训已成为科学和工业中许多应用的高度相关算法程序。粗略地说,可以将ANN视为仿射线性函数和某些固定的非线性函数之间的迭代组成,这些函数通常是一维所谓的激活函数的多维版本。这样的一维激活函数的最流行选择是整流的线性单元(relu)激活函数,该功能将实际数字映射到其正部分$ \ mathbb {r} \ ni x \ mapsto \ mapSto \ max \ max \ max,0 \},0 \} \ in \ mathbb {r} $。在本文中,我们提出并分析了这种依赖周围的标准训练程序的修改变化,从某种意义上说,我们建议将负梯度流动动力限制为ANN参数空间的大量子手机,这是一个严格的$ c^{\ infty} $ -submanifold and Ann Ann Ground oftore and Ann Enl and Ann Exporienty toter toree and ins tode and ins tode and ins totery的参数,但该空间却是一个dinive todiment tode lound nound undione nistotile and lound nound nound因此,它可以表示可以通过整个ANN参数空间表示的所有ANN实现函数。在只有一维ANN层的浅周围的特殊情况下,我们也为每个Lipschitz连续目标函数证明,ANN参数空间的大型子元素上的每个梯度流轨迹都是全球界限的。对于具有Lipschitz连续目标函数的整个ANN参数空间上的标准梯度流,即使在仅具有一维ANN层的浅层ANN的情况下,也可以证明或反驳梯度流轨迹的全局界限。

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular choice of such a one-dimensional activation function is the rectified linear unit (ReLU) activation function which maps a real number to its positive part $ \mathbb{R} \ni x \mapsto \max\{ x, 0 \} \in \mathbb{R} $. In this article we propose and analyze a modified variant of the standard training procedure of such ReLU ANNs in the sense that we propose to restrict the negative gradient flow dynamics to a large submanifold of the ANN parameter space, which is a strict $ C^{ \infty } $-submanifold of the entire ANN parameter space that seems to enjoy better regularity properties than the entire ANN parameter space but which is also sufficiently large and sufficiently high dimensional so that it can represent all ANN realization functions that can be represented through the entire ANN parameter space. In the special situation of shallow ANNs with just one-dimensional ANN layers we also prove for every Lipschitz continuous target function that every gradient flow trajectory on this large submanifold of the ANN parameter space is globally bounded. For the standard gradient flow on the entire ANN parameter space with Lipschitz continuous target functions it remains an open problem of research to prove or disprove the global boundedness of gradient flow trajectories even in the situation of shallow ANNs with just one-dimensional ANN layers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源