论文标题
华夫饼:联邦学习中的水印
WAFFLE: Watermarking in Federated Learning
论文作者
论文摘要
联合学习是一种分布式学习技术,在该技术中,在当地培训数据所在的客户设备上培训了机器学习模型。培训是通过中央服务器协调的,该中央服务器通常由结果模型的预期所有者控制。通过避免将培训数据运送到中央服务器的需要,联合学习可以提高隐私和效率。但这增加了客户盗窃模型的风险,因为每个客户端设备上都可以使用所得模型。即使用于本地培训的应用程序软件可能试图阻止直接访问该模型,但恶意客户端也可以通过反向工程来绕过任何此类限制。水印是一种众所周知的威慑方法,可以通过为模型所有者展示其模型所有权的方式来抵抗模型盗窃。最近的几种深神经网络(DNN)水印技术使用后门:培训模型的其他标签错误数据。后门需要完全访问培训数据并控制培训过程。当单个方以集中式训练模型训练模型时,这是可行的,但在联合学习环境中却不是在培训过程和培训数据之间分配到几个客户设备之间的。在本文中,我们提出了华夫饼,这是使用联邦学习训练的水印DNN模型的第一种方法。它在每次将本地模型聚合到全局模型之后,在服务器上引入了一个重新测试步骤。我们表明,华夫饼有效地将弹性水印嵌入了仅在测试准确性(-0.17%)中可忽略的降解的型号中,并且不需要访问培训数据。我们还引入了一种新型技术,以生成用作水印的后门。它表现优于先前的技术,不强加通信和低计算(+3.2%)开销。
Federated learning is a distributed learning technique where machine learning models are trained on client devices in which the local training data resides. The training is coordinated via a central server which is, typically, controlled by the intended owner of the resulting model. By avoiding the need to transport the training data to the central server, federated learning improves privacy and efficiency. But it raises the risk of model theft by clients because the resulting model is available on every client device. Even if the application software used for local training may attempt to prevent direct access to the model, a malicious client may bypass any such restrictions by reverse engineering the application software. Watermarking is a well-known deterrence method against model theft by providing the means for model owners to demonstrate ownership of their models. Several recent deep neural network (DNN) watermarking techniques use backdooring: training the models with additional mislabeled data. Backdooring requires full access to the training data and control of the training process. This is feasible when a single party trains the model in a centralized manner, but not in a federated learning setting where the training process and training data are distributed among several client devices. In this paper, we present WAFFLE, the first approach to watermark DNN models trained using federated learning. It introduces a retraining step at the server after each aggregation of local models into the global model. We show that WAFFLE efficiently embeds a resilient watermark into models incurring only negligible degradation in test accuracy (-0.17%), and does not require access to training data. We also introduce a novel technique to generate the backdoor used as a watermark. It outperforms prior techniques, imposing no communication, and low computational (+3.2%) overhead.