论文标题
二进制搜索和基于一阶梯度的随机优化方法
Binary Search and First Order Gradient Based Method for Stochastic Optimization
论文作者
论文摘要
在本文中,我们提出了一种新颖的随机优化方法,该方法使用具有基于一阶梯度的优化方法的二进制搜索技术,称为二进制搜索梯度优化(BSG)或BIGRAD。在此优化设置中,非凸表面被视为一组凸表面。在BSG中,首先,定义了一个区域,假设区域是凸。如果区域不是凸,则该算法将区域非常快,并定义了一个新区域,否则,它试图在该区域的最佳点收敛。在BSG中,二进制搜索的核心目的是确定区域是否在对数时间中是凸的,而基于一阶梯度的方法主要是针对定义新区域的。在本文中,亚当被用作基于一阶梯度的方法,但是,也可以考虑此类的其他方法。在深度神经网络设置中,它可以有效地处理消失和爆炸的问题。我们使用逻辑回归和深层神经网络评估了MNIST手写数字,IMDB和CIFAR10数据集的BSG。与其他基于一阶梯度的优化方法相比,我们产生更有希望的结果。此外,与其他方法相比,在看不见的数据上提出的算法在看不见的数据上概括得更好。
In this paper, we present a novel stochastic optimization method, which uses the binary search technique with first order gradient based optimization method, called Binary Search Gradient Optimization (BSG) or BiGrad. In this optimization setup, a non-convex surface is treated as a set of convex surfaces. In BSG, at first, a region is defined, assuming region is convex. If region is not convex, then the algorithm leaves the region very fast and defines a new one, otherwise, it tries to converge at the optimal point of the region. In BSG, core purpose of binary search is to decide, whether region is convex or not in logarithmic time, whereas, first order gradient based method is primarily applied, to define a new region. In this paper, Adam is used as a first order gradient based method, nevertheless, other methods of this class may also be considered. In deep neural network setup, it handles the problem of vanishing and exploding gradient efficiently. We evaluate BSG on the MNIST handwritten digit, IMDB, and CIFAR10 data set, using logistic regression and deep neural networks. We produce more promising results as compared to other first order gradient based optimization methods. Furthermore, proposed algorithm generalizes significantly better on unseen data as compared to other methods.