论文标题

手风琴:通过关键学习制度识别的自适应梯度交流

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

论文作者

Agarwal, Saurabh, Wang, Hongyi, Lee, Kangwook, Venkataraman, Shivaram, Papailiopoulos, Dimitris

论文摘要

由于频繁的模型更新,分布式模型培训受到通信瓶颈的损失。为了减轻这些瓶颈,从业者使用梯度压缩技术,例如稀疏,量化或低级别更新。这些技术通常需要选择静态压缩比,通常要求用户平衡模型准确性和触电加速之间的权衡。在这项工作中,我们表明,由于选择高压比而导致的这种性能降解不是基本的。自适应压缩策略可以减少沟通,同时保持最终测试准确性。受到关键学习制度的最新发现的启发,在这些发现中,小梯度错误可能会对模型性能产生不可恢复的影响,我们建议手风琴一种简单而有效的适应性压缩算法。尽管手风琴平均保持足够高的压缩率,但在关键的学习方案中,它避免了过度压缩梯度,该梯度由简单的基于梯度 - 基于标准检测到。我们对分布式环境中许多机器学习任务的广泛实验研究表明,手风琴保持了与未压缩训练相似的模型准确性,但在静态方法上最多可实现高达5.5倍的训练,最多可获得4.1倍的端到端速度。我们表明,手风琴还可以调整批处理大小,这是减轻交流瓶颈的另一种流行策略。

Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification, quantization, or low-rank updates. The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup. In this work, we show that such performance degradation due to choosing a high compression ratio is not fundamental. An adaptive compression strategy can reduce communication while maintaining final test accuracy. Inspired by recent findings on critical learning regimes, in which small gradient errors can have irrecoverable impact on model performance, we propose Accordion a simple yet effective adaptive compression algorithm. While Accordion maintains a high enough compression rate on average, it avoids over-compressing gradients whenever in critical learning regimes, detected by a simple gradient-norm based criterion. Our extensive experimental study over a number of machine learning tasks in distributed environments indicates that Accordion, maintains similar model accuracy to uncompressed training, yet achieves up to 5.5x better compression and up to 4.1x end-to-end speedup over static approaches. We show that Accordion also works for adjusting the batch size, another popular strategy for alleviating communication bottlenecks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源