论文标题
通过可编程开关启用快速,灵活的分布式深度学习
Enabling Fast and Flexible Distributed Deep Learning with Programmable Switches
论文作者
论文摘要
深度学习已在广泛的领域中使用,并取得了巨大的突破。随着模型尺寸和训练数据量的不断增加,出现了分布式深度学习,它利用集群并行训练模型。不幸的是,由于群集节点之间的通信开销,该性能通常远非线性加速。为了应对这一挑战,本文设计和实施了网络聚合器的天秤座,该网络聚合器利用网络内计算在两个方面优化分布式DL培训的通信:1)减少主动连接,2)聚合交换的网络数据包。我们在Intel Tofino交换机上实施了天秤座,定制了轻巧的主机堆栈,并将其集成到开源培训框架PS-Lite中。实验结果表明,我们的天秤座可以达到1.5〜4倍的速度。
Deep learning has been used in a wide range of areas and made a huge breakthrough. With the ever-increasing model size and train-ing data volume, distributed deep learning emerges which utilizes a cluster to train a model in parallel. Unfortunately, the performance is often far from linear speedup due to the communication overhead between cluster nodes. To address this challenge, this paper designs and implements Libra, a network aggregator, that utilizes in-network computation to optimize the communication for distributed DL training in two aspects: 1) reduce active connections and 2) aggregate exchanged network packets. We implemented our Libra on Intel Tofino switches, customized a lightweight host stack and integrated it into an open-source training framework PS-lite. The experimental result shows that our Libra can achieve 1.5~4 times speedup.