论文标题
CSER:带错误重置的通信效率SGD
CSER: Communication-efficient SGD with Error Reset
论文作者
论文摘要
如今,分布式随机梯度下降(SGD)的可伸缩性受到通信瓶颈的限制。我们提出了一种新型的SGD变体:具有错误重置或CSER的通信效率SGD。 CSER中的关键思想首先是一种称为“错误重置”的新技术,该技术适应了SGD的任意压缩机,从而产生分叉的本地模型,并定期重置产生的局部残留错误。其次,我们引入了梯度和模型的部分同步,利用它们的优势。我们证明CSER对于平滑的非凸问题的收敛性。经验结果表明,当与高度侵略性的压缩机结合使用时,CSER算法将CIFAR-100的分布式训练加速了近10倍,而Imagenet的分布算法则加速了4.5倍。
The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly 10x for CIFAR-100, and by 4.5x for ImageNet.