论文标题
双鱼座:通过带有指导的异步培训的有效联合学习
Pisces: Efficient Federated Learning via Guided Asynchronous Training
论文作者
论文摘要
联合学习(FL)通常以同步平行方式进行,其中慢速客户的参与延迟了训练迭代。当前的FL系统采用参与者选择策略,在每次迭代中选择具有优质数据的快速客户。但是,这在实践中并不总是可能的,而且选择策略通常必须在客户的速度和数据质量之间进行不愉快的权衡。 在本文中,我们提出了双鱼座,这是一种异步FL系统,具有智能参与者选择和用于加速培训的模型聚合。为了避免产生过多的资源成本和陈旧的培训计算,双鱼座使用新颖的评分机制来识别合适的客户参加培训迭代。它还调整了模型聚合的步伐,以动态地限制所选客户端和服务器之间的进度差距,并在平滑的非convex设置中具有可证明的收敛保证。我们已经在一个名为Plato的开源FL平台中实现了双鱼座,并评估了其在流行视觉和语言模型的大规模实验中的性能。双鱼座的表现优于最先进的同步和异步方案,分别高达2.0倍和1.9倍的时间加速。
Federated learning (FL) is typically performed in a synchronous parallel manner, where the involvement of a slow client delays a training iteration. Current FL systems employ a participant selection strategy to select fast clients with quality data in each iteration. However, this is not always possible in practice, and the selection strategy often has to navigate an unpleasant trade-off between the speed and the data quality of clients. In this paper, we present Pisces, an asynchronous FL system with intelligent participant selection and model aggregation for accelerated training. To avoid incurring excessive resource cost and stale training computation, Pisces uses a novel scoring mechanism to identify suitable clients to participate in a training iteration. It also adapts the pace of model aggregation to dynamically bound the progress gap between the selected clients and the server, with a provable convergence guarantee in a smooth non-convex setting. We have implemented Pisces in an open-source FL platform called Plato, and evaluated its performance in large-scale experiments with popular vision and language models. Pisces outperforms the state-of-the-art synchronous and asynchronous schemes, accelerating the time-to-accuracy by up to 2.0x and 1.9x, respectively.