论文标题
深度神经网络的大批量分布培训的数据优化
Data optimization for large batch distributed training of deep neural networks
论文作者
论文摘要
随着数据和模型的增长,深度学习中的分布式培训(DL)是常见的实践。当前的深神经网络分布式培训的实践面临着大规模运行时交流瓶颈的挑战,并建模准确性降低,随着全球批次大小的增加。当前的解决方案着重于提高消息交换效率,并在培训过程中实施对批量批量和模型的实施技术。训练精度的丧失通常会发生,因为损失功能被困在当地的最小值中。我们观察到,损失景观最小化是由模型和培训数据塑造的,并提出了一种数据优化方法,该方法利用机器学习来隐式平滑损失景观,从而减少本地最小值。我们的方法要滤除数据点,这些数据点不太重要,使我们能够加快对较大批次大小的模型训练以提高准确性。
Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniques to tweak batch sizes and models in the training process. The loss of training accuracy typically happens because the loss function gets trapped in a local minima. We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.