论文标题

线性收敛误差补偿SGD

Linearly Converging Error Compensated SGD

论文作者

Gorbunov, Eduard, Kovalev, Dmitry, Makarenko, Dmitry, Richtárik, Peter

论文摘要

在本文中,我们建议对分布式SGD的变体进行任意压缩和延迟更新的统一分析。我们的框架足以涵盖量化的SGD,错误补偿的SGD(EC-SGD)和SGD的不同变体,并具有延迟的更新(D-SGD)。通过单个定理,我们得出了适合我们框架的所有方法的复杂性结果。对于现有方法,该定理给出了最著名的复杂性结果。此外,使用我们的一般方案,我们开发了SGD的新变体,这些变体将降低或任意抽样与错误反馈和量化结合在一起,并得出这些方法击败最新结果的这些方法的收敛速率。为了说明我们的框架的强度,我们开发了16种适合此的新方法。 In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions.此外,对于工人的损失函数具有有限总和的形式时,我们修改了该方法,并获得了一种称为EC-LSVRG-DIANA的新方法,该方法是第一个具有错误反馈和差异的分布式随机方法,它降低了误差反馈和差异,该方法会收敛到确切的最佳渐近级别,并具有恒定的学习率。

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, this theorem gives the best-known complexity results. Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results. In order to illustrate the strength of our framework, we develop 16 new methods that fit this. In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions. Moreover, for the case when the loss function of the worker has the form of finite sum, we modified the method and got a new one called EC-LSVRG-DIANA which is the first distributed stochastic method with error feedback and variance reduction that converges to the exact optimum asymptotically in expectation with a constant learning rate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源