在计算和统计异质性下有效安排联合移动设备

论文标题

在计算和统计异质性下有效安排联合移动设备

Towards Efficient Scheduling of Federated Mobile Devices under Computational and Statistical Heterogeneity

论文作者

Wang, Cong, Yang, Yuanyuan, Zhou, Pengzhan

论文摘要

联合学习源自分布式学习，仅通过共享模型参数来启用新的抽象级别隐私协作。尽管当前的研究主要集中于优化学习算法并最大程度地减少通过分布式学习留下的沟通开销，但在移动设备上实现的真实实现仍然存在很大的差距。在本文中，我们从一个经验实验开始，以证明计算异质性比当前电池供电的移动设备上的通信更为明显，并且现有方法被移动散乱者所困扰。此外，在移动用户中的非相同分配数据使参与者的选择对准确性和融合至关重要。为了应对计算和统计异质性，我们将数据用作调整旋钮，并提出两种有效的多项式时间算法，以在数据相同或非分布的数据时，在各种移动设备上安排不同的工作负载。对于相同分布的数据，我们将分区和线性瓶颈分配结合在一起，以达到近乎最佳的训练时间而不会准确损失。对于非相同分布的数据，我们将其转换为平均成本最小化问题，并提出一种贪婪的算法，以在计算时间和准确性之间找到合理的平衡。我们还建立了一个离线探测器来量化不同设备的运行时行为，该行为是调度算法的输入。我们在带有两个数据集和多达20个设备的移动测试床上进行了广泛的实验。与共同的基准相比，提出的算法以2-100倍加速时期的速度，准确性增长2-7％，并在CIFAR10上提高收敛速度超过100％。

Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the current research mainly focuses on optimizing learning algorithms and minimizing communication overhead left by distributed learning, there is still a considerable gap when it comes to the real implementation on mobile devices. In this paper, we start with an empirical experiment to demonstrate computation heterogeneity is a more pronounced bottleneck than communication on the current generation of battery-powered mobile devices, and the existing methods are haunted by mobile stragglers. Further, non-identically distributed data across the mobile users makes the selection of participants critical to the accuracy and convergence. To tackle the computational and statistical heterogeneity, we utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed. For identically distributed data, we combine partitioning and linear bottleneck assignment to achieve near-optimal training time without accuracy loss. For non-identically distributed data, we convert it into an average cost minimization problem and propose a greedy algorithm to find a reasonable balance between computation time and accuracy. We also establish an offline profiler to quantify the runtime behavior of different devices, which serves as the input to the scheduling algorithms. We conduct extensive experiments on a mobile testbed with two datasets and up to 20 devices. Compared with the common benchmarks, the proposed algorithms achieve 2-100x speedup epoch-wise, 2-7% accuracy gain and boost the convergence rate by more than 100% on CIFAR10.

下载PDF全文

下载文献需遵守相关版权规定

论文标题