论文标题
用错误反馈量化亚当
Quantized Adam with Error Feedback
论文作者
论文摘要
在本文中,我们提出了一种自适应随机梯度方法的分布式变体,用于训练参数 - 服务器模型中的深神经网络。为了降低工人和服务器之间的通信成本,我们将两种类型的量化方案(即梯度量化和权重量化)纳入了拟议的分布式ADAM。此外,为了减少量化操作引入的偏差,我们提出了一种错误反馈技术来补偿量化梯度。从理论上讲,在随机的非convex设置中,我们表明,具有梯度量化和错误反馈的分布式自适应梯度方法将分布式自适应梯度方法收敛到一阶固定点,并且分布式自适应梯度方法具有重量量化和错误反馈和误差反馈的收敛到与单个工人和多工人在多工厂和多工人中相关的点相关的点。最后,我们将所提出的分布式自适应梯度方法应用于训练深层神经网络。实验结果证明了我们方法的功效。
In this paper, we present a distributed variant of adaptive stochastic gradient method for training deep neural networks in the parameter-server model. To reduce the communication cost among the workers and server, we incorporate two types of quantization schemes, i.e., gradient quantization and weight quantization, into the proposed distributed Adam. Besides, to reduce the bias introduced by quantization operations, we propose an error-feedback technique to compensate for the quantized gradient. Theoretically, in the stochastic nonconvex setting, we show that the distributed adaptive gradient method with gradient quantization and error-feedback converges to the first-order stationary point, and that the distributed adaptive gradient method with weight quantization and error-feedback converges to the point related to the quantized level under both the single-worker and multi-worker modes. At last, we apply the proposed distributed adaptive gradient methods to train deep neural networks. Experimental results demonstrate the efficacy of our methods.