论文标题

与非IID客户的基于GAN的数据合成的联合学习

Federated Learning with GAN-based Data Synthesis for Non-IID Clients

论文作者

Li, Zijian, Shao, Jiawei, Mao, Yuyi, Wang, Jessie Hui, Zhang, Jun

论文摘要

Federated学习(FL)最近已成为一种流行的保护隐私协作学习范式。但是,它遭受了客户之间非独立和相同分布的(非IID)数据的困扰。在本文中,我们提出了一个新颖的框架,称为合成数据辅助联盟学习(SDA-FL),以通过共享合成数据来解决这一非IID挑战。具体而言,每个客户端都预定了局部生成对抗网络(GAN)来生成差异化私有合成数据,这些数据被上传到参数服务器(PS)以构建一个全局共享的合成数据集。为了为合成数据集生成自信的伪标签,我们还提出了PS执行的迭代伪标记机制。本地私人数据集和合成数据集与自信的伪标签的结合可导致客户之间的数据分布几乎相同,从而提高了本地模型之间的一致性并使全球聚合受益。广泛的实验证明,在监督和半监督的设置下,所提出的框架在几个基准数据集中的大幅度优于基线方法。

Federated learning (FL) has recently emerged as a popular privacy-preserving collaborative learning paradigm. However, it suffers from the non-independent and identically distributed (non-IID) data among clients. In this paper, we propose a novel framework, named Synthetic Data Aided Federated Learning (SDA-FL), to resolve this non-IID challenge by sharing synthetic data. Specifically, each client pretrains a local generative adversarial network (GAN) to generate differentially private synthetic data, which are uploaded to the parameter server (PS) to construct a global shared synthetic dataset. To generate confident pseudo labels for the synthetic dataset, we also propose an iterative pseudo labeling mechanism performed by the PS. A combination of the local private dataset and synthetic dataset with confident pseudo labels leads to nearly identical data distributions among clients, which improves the consistency among local models and benefits the global aggregation. Extensive experiments evidence that the proposed framework outperforms the baseline methods by a large margin in several benchmark datasets under both the supervised and semi-supervised settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源