通过层面的分区和合并，以高效且可扩展的深度学习

论文标题

通过层面的分区和合并，以高效且可扩展的深度学习

Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning

论文作者

Akintoye, Samson B., Han, Liangxiu, Lloyd, Huw, Zhang, Xin, Dancey, Darren, Chen, Haoming, Zhang, Daoqiang

论文摘要

深度神经网络（DNN）模型通常是从一个层到另一层训练的，这会导致向前，向后和更新锁定的问题，从而导致训练时间的性能差。减轻这些问题的现有并行策略提供了次优的运行时性能。在这项工作中，我们提出了一种新颖的层面分区和合并，向前和向后通过并行框架，以提供更好的训练性能。拟议工作的新颖性包括1）层面分区和合并模型，该模型可以最大程度地减少设备之间的通信开销，而不会在培训过程中没有现有策略的内存成本； 2）向后通过和向后通过并行化和优化，以解决更新锁定问题并最大程度地减少培训总成本。对实际用例的实验评估表明，在训练速度方面，所提出的方法优于最先进的方法。并在不损害非平行方法的准确性的情况下实现几乎线性加速。

Deep Neural Network (DNN) models are usually trained sequentially from one layer to another, which causes forward, backward and update locking's problems, leading to poor performance in terms of training time. The existing parallel strategies to mitigate these problems provide suboptimal runtime performance. In this work, we have proposed a novel layer-wise partitioning and merging, forward and backward pass parallel framework to provide better training performance. The novelty of the proposed work consists of 1) a layer-wise partition and merging model which can minimise communication overhead between devices without the memory cost of existing strategies during the training process; 2) a forward pass and backward pass parallelisation and optimisation to address the update locking problem and minimise the total training cost. The experimental evaluation on real use cases shows that the proposed method outperforms the state-of-the-art approaches in terms of training speed; and achieves almost linear speedup without compromising the accuracy performance of the non-parallel approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题