论文标题

Moniqua:Modulo在分散SGD中量化通信

Moniqua: Modulo Quantized Communication in Decentralized SGD

论文作者

Lu, Yucheng, De Sa, Christopher

论文摘要

以分散的方式运行随机梯度下降(SGD)已显示出令人鼓舞的结果。在本文中,我们提出了Moniqua,该技术允许分散的SGD使用量化的通信。从理论上讲,我们证明了Moniqua在迭代中传达了界限数量的位,同时以与原始算法相同的渐近率汇聚,与完全精确的通信相同。 Moniqua对先前的工作有所改善,因为(1)需要零存储器,(2)可进行1位量化,并且(3)适用于多种分散算法。我们从经验上证明,与其他量化的分散算法相比,相对于壁时钟时间的收敛速度更快。我们还表明,Moniqua对非常低的位预算是强大的,可以在CIFAR10上训练RESNET20和RESNET110时,可以在没有损害验证准确性的情况下进行每参数1位通信。

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing 1-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源