PowerGossip：分散的深度学习中实用的低排名沟通压缩

论文标题

PowerGossip：分散的深度学习中实用的低排名沟通压缩

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

论文作者

Vogels, Thijs, Karimireddy, Sai Praneeth, Jaggi, Martin

论文摘要

有损梯度压缩已成为克服机器学习模型中心协调的分布式培训中的通信瓶颈的实用工具。但是，通过任意连接网络进行压缩通信的分散培训的算法更为复杂，需要额外的内存和超参数。我们介绍了一种简单的算法，该算法使用应用于模型差异的低排名线性压缩机直接压缩相邻工人之间的模型差异。受到集中式深度学习的PowerSGD算法的启发，该算法使用电源迭代步骤来最大化每位传输的信息。我们证明，我们的方法不需要其他超参数，收敛速度比先前的方法更快，并且渐近地与网络和压缩都独立。开箱即用，这些压缩机在一系列深度学习的基准测试中与最新的调谐压缩算法相同。

Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models. However, algorithms for decentralized training with compressed communication over arbitrary connected networks have been more complicated, requiring additional memory and hyperparameters. We introduce a simple algorithm that directly compresses the model differences between neighboring workers using low-rank linear compressors applied on model differences. Inspired by the PowerSGD algorithm for centralized deep learning, this algorithm uses power iteration steps to maximize the information transferred per bit. We prove that our method requires no additional hyperparameters, converges faster than prior methods, and is asymptotically independent of both the network and the compression. Out of the box, these compressors perform on par with state-of-the-art tuned compression algorithms in a series of deep learning benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题