两阶段多方计算实现了隐私的联合学习

论文标题

两阶段多方计算实现了隐私的联合学习

Two-Phase Multi-Party Computation Enabled Privacy-Preserving Federated Learning

论文作者

Kanagavelu, Renuga, Li, Zengxiang, Samsudin, Juniarto, Yang, Yechao, Yang, Feng, Goh, Rick Siow Mong, Cheah, Mervyn, Wiwatphonthana, Praewpiraya, Akkarajitsakul, Khajonpong, Wangz, Shangguang

论文摘要

全球各地一直在推动对收集的个人或私人数据保护的严格规定。传统的集中式机器学习方法是从最终用户或物联网设备收集数据的，因此它可以发现现实世界数据背后的见解，对于许多数据驱动的行业应用，根据此类法规，可能是不可行的。 Google作为联合学习（FL）创造的一种新的机器学习方法使多个参与者可以在不直接交换数据的情况下集体训练机器学习模型。但是，最近的研究表明，仍然有可能利用共享模型来提取个人或机密数据。在本文中，我们建议采用多方计算（MPC），以实现FL的隐私保护模型聚合。启用MPC的模型聚合以对等方式汇总会导致高可扩展性较高的高度通信开销。为了解决这个问题，作者提议通过1）选举小委员会开发两阶段的机制，以及2）通过委员会向更多参与者提供支持MPC的模型聚合服务。启用MPC的FL框架已集成在用于智能制造的IoT平台中。它使一组公司能够通过在自己的场所利用其互补数据集来统治高质量模型，而无需损害隐私，模型的准确性，相对于传统的机器学习方法以及在沟通成本和执行时间方面的执行效率。

Countries across the globe have been pushing strict regulations on the protection of personal or private data collected. The traditional centralized machine learning method, where data is collected from end-users or IoT devices, so that it can discover insights behind real-world data, may not be feasible for many data-driven industry applications in light of such regulations. A new machine learning method, coined by Google as Federated Learning (FL) enables multiple participants to train a machine learning model collectively without directly exchanging data. However, recent studies have shown that there is still a possibility to exploit the shared models to extract personal or confidential data. In this paper, we propose to adopt Multi Party Computation (MPC) to achieve privacy-preserving model aggregation for FL. The MPC-enabled model aggregation in a peer-to-peer manner incurs high communication overhead with low scalability. To address this problem, the authors proposed to develop a two-phase mechanism by 1) electing a small committee and 2) providing MPC-enabled model aggregation service to a larger number of participants through the committee. The MPC enabled FL framework has been integrated in an IoT platform for smart manufacturing. It enables a set of companies to train high quality models collectively by leveraging their complementary data-sets on their own premises, without compromising privacy, model accuracy vis-a-vis traditional machine learning methods and execution efficiency in terms of communication cost and execution time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题