洗牌 - 交换带来更快的速度：减少交流期间的闲置时间，用于分散的神经网络培训

论文标题

洗牌 - 交换带来更快的速度：减少交流期间的闲置时间，用于分散的神经网络培训

Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

论文作者

Yang, Xiang

论文摘要

作为加速深度神经网络（DNN）训练的关键方案，分布式随机梯度下降（DSGD）在许多现实世界中广泛采用。在大多数分布式深度学习（DL）框架中，DSGD使用环形架构（RING-SGD）实现，并使用计算通信重叠策略来解决DSGD所需的大规模通信的开销。但是，我们观察到，尽管$ O（1）$ $梯度需要在RING-SGD中进行每个工人进行传达，但Ring-SGD要求的$ O（N）$握手限制了与许多工人或在高潜伏期网络中进行培训时的使用情况。在本文中，我们提出了Shuffle-Extchange SGD（SESGD）来解决RING-SGD的困境。在16名具有0.1ms以太网延迟的工人的集群中，SESGD可以将DNN培训加速至$ 1.7 \ times $，而不会丢失模型准确性。此外，该过程可以加速高达$ 5 \ times $的高潜伏期网络（5ms）。

As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is implemented with Ring-AllReduce architecture (Ring-SGD) and uses a computation-communication overlap strategy to address the overhead of the massive communications required by DSGD. However, we observe that although $O(1)$ gradients are needed to be communicated per worker in Ring-SGD, the $O(n)$ handshakes required by Ring-SGD limits its usage when training with many workers or in high latency network. In this paper, we propose Shuffle-Exchange SGD (SESGD) to solve the dilemma of Ring-SGD. In the cluster of 16 workers with 0.1ms Ethernet latency, SESGD can accelerate the DNN training to $1.7 \times$ without losing model accuracy. Moreover, the process can be accelerated up to $5\times$ in high latency networks (5ms).

下载PDF全文

下载文献需遵守相关版权规定

论文标题