论文标题
用Tusla驯服神经网络:通过自适应随机梯度Langevin算法学习的非凸线学习
Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms
论文作者
论文摘要
人工神经网络(ANN)通常是高度非线性的系统,通过优化其相关的非凸丢失函数来对其进行精细调整。在许多情况下,任何此类损失函数的梯度都具有超线性的生长,这使得使用广泛认可的(随机)梯度下降方法,这些方法基于Euler数值方案,有问题。我们提供了一种新的学习算法,基于流行随机梯度Langevin Dynamics(SGLD)的适当构建的变体,该变体称为驯服未调整的随机Langevin算法(TUSLA)。我们还提供了在使用ANN的非凸学习问题的背景下,对新算法的收敛属性提供了非唤醒分析。因此,我们为图斯拉提供有限的时间保证,以找到经验和人口风险的近似最小化。 TUSLA算法的根基于taming技术的扩散过程,其超线性系数在\ citet {tamed-euler,sabanisaoap}中开发,而在\ citet {tula}中开发了MCMC算法。提供了数值实验,以证实理论发现,并说明了与ANN框架内的香草SGLD相比,新算法的使用需要。
Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.