apmsqueeze：沟通有效的亚当基本动量SGD算法

论文标题

apmsqueeze：沟通有效的亚当基本动量SGD算法

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

论文作者

Tang, Hanlin, Gan, Shaoduo, Rajbhandari, Samyam, Lian, Xiangru, Liu, Ji, He, Yuxiong, Zhang, Ce

论文摘要

亚当是重要的优化算法，可确保培训许多重要任务（例如BERT和Imagenet）的效率和准确性。但是，亚当通常与信息（梯度）压缩技术不兼容。因此，沟通通常成为平行化亚当的瓶颈。在本文中，我们提出了一个通信有效{\ bf a} dam {\ bf p}通过错误补偿方法压缩梯度的错误补偿方法。所提出的算法在时期达到了与亚当的类似收敛效率，但显着降低了每个时期的运行时间。就端到端性能（包括完整的前提条件步骤）而言，APMSqueeze能够根据网络带宽的不同。

Adam is the important optimization algorithm to guarantee efficiency and accuracy for training many important tasks such as BERT and ImageNet. However, Adam is generally not compatible with information (gradient) compression technology. Therefore, the communication usually becomes the bottleneck for parallelizing Adam. In this paper, we propose a communication efficient {\bf A}DAM {\bf p}reconditioned {\bf M}omentum SGD algorithm-- named APMSqueeze-- through an error compensated method compressing gradients. The proposed algorithm achieves a similar convergence efficiency to Adam in term of epochs, but significantly reduces the running time per epoch. In terms of end-to-end performance (including the full-precision pre-condition step), APMSqueeze is able to provide {sometimes by up to $2-10\times$ speed-up depending on network bandwidth.} We also conduct theoretical analysis on the convergence and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题