论文标题
通过非线性莫尔德格里德(Multigrid
Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid
论文作者
论文摘要
开发了用于求解深残留网络的多机完整近似存储算法,以实现GPU上的神经网络平行层训练和并发计算内核执行。这项工作证明了使用相同数量的计算单元对传统层模型并行技术技术的速度10.2倍。
A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.