随机梯度下降优化算法的弱误差分析

论文标题

随机梯度下降优化算法的弱误差分析

Weak error analysis for stochastic gradient descent optimization algorithms

论文作者

Bercher, Aritz, Gonon, Lukas, Jentzen, Arnulf, Salimova, Diyora

论文摘要

随机梯度下降（SGD）类型优化方案是大量基于机器学习算法的基本成分。特别是，SGD类型优化方案经常用于涉及自然语言处理，对象和面部识别，欺诈检测，计算广告以及部分微分方程的数值近似值的应用中。在SGD类型优化方案的数学收敛结果中，通常在科学文献中研究了两种类型的误差标准，即在强大意义上的误差和相对于目标函数的误差。在应用程序中，人们通常不仅对误差的大小相对于目标函数感兴趣，而且对可能与目标函数有所不同的测试功能相对于误差的大小感兴趣。该错误大小的分析是本文的主题。特别是，本文的主要结果证明，在适当的假设下，此误差的大小的速度与测试功能与目标函数一致的特殊情况相同。

Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equations. In mathematical convergence results for SGD type optimization schemes there are usually two types of error criteria studied in the scientific literature, that is, the error in the strong sense and the error with respect to the objective function. In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function. The analysis of the size of this error is the subject of this article. In particular, the main result of this article proves under suitable assumptions that the size of this error decays at the same speed as in the special case where the test function coincides with the objective function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题