论文标题
联合学习的即时汇总
Just-in-Time Aggregation for Federated Learning
论文作者
论文摘要
联合学习(FL)工作的数量和规模的增加,需要对汇总的资源有效调度和管理,以使云构成的聚合工作的经济学。现有的FL研究集中在FL算法和优化的设计上,而少于聚合的功效。现有的FL平台通常采用积极等待模型更新的聚合器。这浪费了云上的计算资源,尤其是在大规模的FL设置中,在大规模的情况下,各方间歇性地进行培训。 在本文中,我们提出了一种新的FL聚合范式 - “恰到时光”(JIT)聚合,该范围(JIT)聚合利用FL作业的独特属性,尤其是模型更新的周期性,以尽可能多地推迟聚集,并免费计算其他FL作业或其他数据中心工作负载。我们描述了一种新的方法,可以优先使用FL作业进行聚合,并使用多个数据集,模型和FL汇总算法证明,与现有FL平台中使用的急切聚合相比,我们的技术可以将资源使用量减少60+%。我们还证明,使用JIT聚合可以忽略不计的开销和对FL工作潜伏期的影响。
The increasing number and scale of federated learning (FL) jobs necessitates resource efficient scheduling and management of aggregation to make the economics of cloud-hosted aggregation work. Existing FL research has focused on the design of FL algorithms and optimization, and less on the efficacy of aggregation. Existing FL platforms often employ aggregators that actively wait for model updates. This wastes computational resources on the cloud, especially in large scale FL settings where parties are intermittently available for training. In this paper, we propose a new FL aggregation paradigm -- "just-in-time" (JIT) aggregation that leverages unique properties of FL jobs, especially the periodicity of model updates, to defer aggregation as much as possible and free compute resources for other FL jobs or other datacenter workloads. We describe a novel way to prioritize FL jobs for aggregation, and demonstrate using multiple datasets, models and FL aggregation algorithms that our techniques can reduce resource usage by 60+\% when compared to eager aggregation used in existing FL platforms. We also demonstrate that using JIT aggregation has negligible overhead and impact on the latency of the FL job.