图上的异质联邦学习

论文标题

图上的异质联邦学习

Heterogeneous Federated Learning on a Graph

论文作者

Wang, Huiyuan, Zhao, Xuyang, Lin, Wei

论文摘要

在分布式机器学习实践中越来越受欢迎，在分布式机器学习实践中越来越受欢迎，在不共享本地数据的情况下，对算法进行了算法培训的联合学习。通常，图形结构$ g $存在于本地设备以进行通信。在这项工作中，我们考虑使用数据分布和通信异质性以及本地设备的计算能力有限的联合学习中的参数估计。我们通过在本地设备上参数化分布来编码分布异质性，并具有一组不同的$ p $维矢量。然后，我们建议在$ m $估算框架下与融合的套索正则化的所有设备共同估算所有设备的参数，从而鼓励$ g $中连接的设备上参数的同等估计。根据$ G $，我们可以为估计器提供一般结果，可以进一步校准，以获得各种特定问题设置的收敛率。令人惊讶的是，我们的估计器在$ g $上达到了某些图的保真度条件下的最佳速率，就好像我们可以汇总所有共享相同分布的样本一样。如果未满足图形保真度条件，我们将通过多个测试提出一个边缘选择过程，以确保最佳性。为了减轻本地计算的负担，提供了ADMM的分散随机版本，其中收敛速率$ o（t^{ - 1} \ log t）$，其中$ t $表示迭代的数量。我们强调，我们的算法在每次迭代时仅沿$ g $的边缘传输参数，而无需保留隐私的中央机器。我们将其进一步扩展到在训练过程中随机无法访问设备的情况，并具有类似的算法收敛保证。模拟实验和2020年美国总统选举数据集证明了我们方法的计算和统计效率。

Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure $G$ exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct $p$-dimensional vectors. We then propose to jointly estimate parameters of all devices under the $M$-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in $G$. We provide a general result for our estimator depending on $G$, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on $G$, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of $G$ at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题