论文标题
个性化联合学习:一种元学习方法
Personalized Federated Learning: A Meta-Learning Approach
论文作者
论文摘要
在联合学习中,我们的目标是跨多个计算单元(用户)培训模型,而用户只能与通用中央服务器进行通信,而无需交换数据示例。该机制利用了所有用户的计算能力,并允许用户获得更丰富的模型,因为他们的模型经过了较大的数据点培训。但是,该方案仅为所有用户开发一个共同的输出,因此,它不会使模型调整到每个用户。这是一个重要的缺失功能,尤其是考虑到各种用户的基础数据分布的异质性。在本文中,我们研究了联合学习的个性化变体,我们的目标是找到一个初始的共享模型,而当前或新用户可以通过执行相对于自己的数据执行一个或几个梯度下降的步骤来轻松适应其本地数据集。这种方法可保留联合学习体系结构的所有好处,并通过结构为每个用户提供一个更个性化的模型。我们表明,可以在模型不合时宜的元学习(MAML)框架中研究这个问题。受此连接的启发,我们研究了众所周知的联邦平均算法的个性化变体,并根据非convex损失函数的梯度规范评估了其性能。此外,我们表征了这种绩效如何受到用户数据基础分布的亲密关系的影响,该分布是根据分布距离(例如总变化和1-Wasserstein指标)来衡量的。
In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common output for all the users, and, therefore, it does not adapt the model to each user. This is an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data. This approach keeps all the benefits of the federated learning architecture, and, by structure, leads to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we study a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric.