神秘的群学习：基于区块链的新范式分散的联合学习

论文标题

神秘的群学习：基于区块链的新范式分散的联合学习

Demystifying Swarm Learning: A New Paradigm of Blockchain-based Decentralized Federated Learning

论文作者

Han, Jialiang, Ma, Yun, Han, Yudong

论文摘要

联合学习（FL）是一种有希望的保护隐私的机器学习范式，并引起了研究人员和开发人员的越来越多的关注。 FL可以在设备上保留用户的私人数据，并交换本地模型的梯度，以合作培训有关中央保管人的共享深度学习（DL）模型。但是，越来越多地讨论了FL的安全性和容错性，因为其中央托管机制或星形建筑可能容易受到恶意攻击或软件故障的影响。为了解决这些问题，Swarm Learning（SL）引入了一个权限的区块链，以安全地登上成员并动态选举领导者，这允许以极其分散的方式执行DL。与对SL的极大关注相比，几乎没有关于SL或基于区块链的分散FL的经验研究，这些研究提供了对在现实世界中部署SL的最佳实践和预防措施的全面知识。因此，就我们而言，我们进行了首次对SL的全面研究，以填补SL部署与开发人员之间的知识差距。在本文中，我们对5个研究问题的3个公共数据集进行了各种实验，提出了有趣的发现，定量分析了这些发现背后的原因，并为开发人员和研究人员提供了实际建议。这些发现证明了SL应该适合大多数应用程序场景，无论数据集是否平衡，污染或偏向于无关的功能。

Federated learning (FL) is an emerging promising privacy-preserving machine learning paradigm and has raised more and more attention from researchers and developers. FL keeps users' private data on devices and exchanges the gradients of local models to cooperatively train a shared Deep Learning (DL) model on central custodians. However, the security and fault tolerance of FL have been increasingly discussed, because its central custodian mechanism or star-shaped architecture can be vulnerable to malicious attacks or software failures. To address these problems, Swarm Learning (SL) introduces a permissioned blockchain to securely onboard members and dynamically elect the leader, which allows performing DL in an extremely decentralized manner. Compared with tremendous attention to SL, there are few empirical studies on SL or blockchain-based decentralized FL, which provide comprehensive knowledge of best practices and precautions of deploying SL in real-world scenarios. Therefore, we conduct the first comprehensive study of SL to date, to fill the knowledge gap between SL deployment and developers, as far as we are concerned. In this paper, we conduct various experiments on 3 public datasets of 5 research questions, present interesting findings, quantitatively analyze the reasons behind these findings, and provide developers and researchers with practical suggestions. The findings have evidenced that SL is supposed to be suitable for most application scenarios, no matter whether the dataset is balanced, polluted, or biased over irrelevant features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题