Bapipe：DNN培训的平衡管道并行探索

论文标题

Bapipe：DNN培训的平衡管道并行探索

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

论文作者

Zhao, Letian, Xu, Rui, Wang, Tianqi, Tian, Teng, Wang, Xiaotian, Wu, Wei, Ieong, Chio-in, Jin, Xi

论文摘要

随着机器学习算法的复杂性的增加，深神经网络（DNN）的大小迅速增长。为了满足对DNN训练的计算和记忆的要求，基于模型并行性的分布深度学习已得到广泛认可。我们提出了一个新的管道并行性培训框架Bapipe，该框架可以自动探索DNN分布式培训的管道并行训练方法和平衡的分区策略。在Bapipe中，每个加速器都计算网络不同部分的正向传播和向后传播，以实施批量内管道并行性策略。 Bapipe使用新的负载平衡自动探索策略，该策略考虑了DNN模型的参数以及加速器簇的计算，内存和通信资源。我们在GPU簇上训练了不同的DNN，例如VGG-16，Resnet-50和GNMT，并模拟了不同FPGA簇的性能。与最先进的数据并行性和管道并行性框架相比，Bapipe在各种平台中最多可提供3.2倍的加速和4倍的内存减少。

The size of deep neural networks (DNNs) grows rapidly as the complexity of the machine learning algorithm increases. To satisfy the requirement of computation and memory of DNN training, distributed deep learning based on model parallelism has been widely recognized. We propose a new pipeline parallelism training framework, BaPipe, which can automatically explore pipeline parallelism training methods and balanced partition strategies for DNN distributed training. In BaPipe, each accelerator calculates the forward propagation and backward propagation of different parts of networks to implement the intra-batch pipeline parallelism strategy. BaPipe uses a new load balancing automatic exploration strategy that considers the parameters of DNN models and the computation, memory, and communication resources of accelerator clusters. We have trained different DNNs such as VGG-16, ResNet-50, and GNMT on GPU clusters and simulated the performance of different FPGA clusters. Compared with state-of-the-art data parallelism and pipeline parallelism frameworks, BaPipe provides up to 3.2x speedup and 4x memory reduction in various platforms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题