论文标题

通过非线性莫尔德格里德(Multigrid

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

论文作者

Kirby, Andrew C., Samsi, Siddharth, Jones, Michael, Reuther, Albert, Kepner, Jeremy, Gadepally, Vijay

论文摘要

开发了用于求解深残留网络的多机完整近似存储算法,以实现GPU上的神经网络平行层训练和并发计算内核执行。这项工作证明了使用相同数量的计算单元对传统层模型并行技术技术的速度10.2倍。

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源