论文标题

洗牌 - 交换带来更快的速度:减少交流期间的闲置时间,用于分散的神经网络培训

Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

论文作者

Yang, Xiang

论文摘要

作为加速深度神经网络(DNN)训练的关键方案,分布式随机梯度下降(DSGD)在许多现实世界中广泛采用。在大多数分布式深度学习(DL)框架中,DSGD使用环形架构(RING-SGD)实现,并使用计算通信重叠策略来解决DSGD所需的大规模通信的开销。但是,我们观察到,尽管$ O(1)$ $梯度需要在RING-SGD中进行每个工人进行传达,但Ring-SGD要求的$ O(N)$握手限制了与许多工人或在高潜伏期网络中进行培训时的使用情况。在本文中,我们提出了Shuffle-Extchange SGD(SESGD)来解决RING-SGD的困境。在16名具有0.1ms以太网延迟的工人的集群中,SESGD可以将DNN培训加速至$ 1.7 \ times $,而不会丢失模型准确性。此外,该过程可以加速高达$ 5 \ times $的高潜伏期网络(5ms)。

As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is implemented with Ring-AllReduce architecture (Ring-SGD) and uses a computation-communication overlap strategy to address the overhead of the massive communications required by DSGD. However, we observe that although $O(1)$ gradients are needed to be communicated per worker in Ring-SGD, the $O(n)$ handshakes required by Ring-SGD limits its usage when training with many workers or in high latency network. In this paper, we propose Shuffle-Exchange SGD (SESGD) to solve the dilemma of Ring-SGD. In the cluster of 16 workers with 0.1ms Ethernet latency, SESGD can accelerate the DNN training to $1.7 \times$ without losing model accuracy. Moreover, the process can be accelerated up to $5\times$ in high latency networks (5ms).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源