深度学习应用中的优化方法至关重要吗？

论文标题

深度学习应用中的优化方法至关重要吗？

Do optimization methods in deep learning applications matter?

论文作者

Ozyildirim, Buse Melis, Kiran, Mariam

论文摘要

随着深度学习的进步，指数数据的增长和增加模型的复杂性，开发有效的优化方法引起了很多研究的关注。几种实施有利于使用共轭梯度（CG）和随机梯度下降（SGD）作为实用和优雅的解决方案，以实现快速收敛，但是，这些优化过程在深度学习应用程序中学习也有许多限制。最近的研究正在探索高阶优化功能作为更好的方法，但是这些对于实际使用而言非常复杂的计算挑战。在本文中，我们的实验表明，Levemberg-Marquardt（LM）可显着取代最佳收敛，但处理时间非常大，增加了分类和加强学习问题的训练复杂性。我们的实验比较了标准CIFAR，MNIST，CARTPOLE和FLAPPYBIRD实验的现成优化功能（CG，SGD，LM和L-BFGS）。本文提出了要使用的优化函数以及进一步的优化功能，这些功能将受益于平行化工作，以提高预期的时间和学习时间和学习速率的融合。

With advances in deep learning, exponential data growth and increasing model complexity, developing efficient optimization methods are attracting much research attention. Several implementations favor the use of Conjugate Gradient (CG) and Stochastic Gradient Descent (SGD) as being practical and elegant solutions to achieve quick convergence, however, these optimization processes also present many limitations in learning across deep learning applications. Recent research is exploring higher-order optimization functions as better approaches, but these present very complex computational challenges for practical use. Comparing first and higher-order optimization functions, in this paper, our experiments reveal that Levemberg-Marquardt (LM) significantly supersedes optimal convergence but suffers from very large processing time increasing the training complexity of both, classification and reinforcement learning problems. Our experiments compare off-the-shelf optimization functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and FlappyBird experiments.The paper presents arguments on which optimization functions to use and further, which functions would benefit from parallelization efforts to improve pretraining time and learning rate convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题