大规模神经网络培训的调查

论文标题

大规模神经网络培训的调查

Survey on Large Scale Neural Network Training

论文作者

Gusak, Julia, Cherniuk, Daria, Shilova, Alena, Katrutsa, Alexander, Bershatsky, Daniel, Zhao, Xunyi, Eyraud-Dubois, Lionel, Shlyazhko, Oleg, Dimitrov, Denis, Oseledets, Ivan, Beaumont, Olivier

论文摘要

现代深度神经网络（DNNS）需要大量记忆，以存储训练期间的重量，激活和其他中间张量。因此，许多型号不适合使用一个GPU设备，也不可以使用小型人均批次尺寸进行训练。这项调查提供了对能够更有效的DNNS培训的方法进行系统的概述。我们分析具有单个或几个GPU的体系结构上的计算和通信资源，可以节省内存并充分利用计算和通信资源。我们总结了策略的主要类别，并比较了类别内和跨类别的策略。除文献中提出的方法外，我们还讨论了可用的实现。

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题