关于使用异构数据的联邦平均有效性不合理

论文标题

关于使用异构数据的联邦平均有效性不合理

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

论文作者

Wang, Jianyu, Das, Rudrajit, Joshi, Gauri, Kale, Satyen, Xu, Zheng, Zhang, Tong

论文摘要

现有理论预测，数据异质性将降低联邦平均（FedAvg）算法在联合学习中的性能。但是，实际上，简单的FedAvg算法的收敛良好。本文解释了与以前的理论预测相矛盾的FedAvg的看似不合理的有效性。我们发现，在以前的理论分析中，有界梯度差异的关键假设太悲观了，无法表征实际应用中的数据异质性。对于一个简单的二次问题，我们证明存在存在较大的梯度差异对FedAvg收敛的负面影响。在这种观察结果的推动下，我们提出了一个新的数量，最佳的平均漂移，以衡量数据异质性的效果，并明确使用它来介绍FedAvg的新理论分析。我们表明，在许多实际联合训练任务中，最佳的平均漂移几乎为零，而梯度差异可能很大。我们的新分析表明，FedAvg可以在均质和异质数据设置中具有相同的收敛速率，因此可以更好地理解其经验成功。

Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in practical applications. For a simple quadratic problem, we demonstrate there exist regimes where large gradient dissimilarity does not have any negative impact on the convergence of FedAvg. Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg. We show that the average drift at optimum is nearly zero across many real-world federated training tasks, whereas the gradient dissimilarity can be large. And our new analysis suggests FedAvg can have identical convergence rates in homogeneous and heterogeneous data settings, and hence, leads to better understanding of its empirical success.

下载PDF全文

下载文献需遵守相关版权规定

论文标题