Dynamarks：使用动态水印来防御深度学习模型提取

论文标题

Dynamarks：使用动态水印来防御深度学习模型提取

DynaMarks: Defending Against Deep Learning Model Extraction Using Dynamic Watermarking

论文作者

Chakraborty, Abhishek, Xing, Daniel, Liu, Yuntao, Srivastava, Ankur

论文摘要

深度学习（DL）模型的功能可以通过模型提取被盗，其中攻击者通过利用原始模型的预测API来获得替代模型。在这项工作中，我们提出了一种称为Dynamarks的新型水印技术，以保护DL模型的知识产权（IP）免受黑色盒子环境中这种模型提取攻击的影响。与现有方法不同，Dynamarks不会改变原始模型的训练过程，而是通过基于推理运行时的某些秘密参数从原始模型预测API中动态更改输出响应来将水印嵌入替代模型。时尚MNIST，CIFAR-10和Imagenet数据集的实验结果证明了Dynamarks方案的功效，同时保留了部署在边缘设备中的原始模型的准确性。此外，我们还进行了实验，以评估Dynamarks对各种水印策略的鲁棒性，从而使DL模型所有者可以可靠地证明模型所有权。

The functionality of a deep learning (DL) model can be stolen via model extraction where an attacker obtains a surrogate model by utilizing the responses from a prediction API of the original model. In this work, we propose a novel watermarking technique called DynaMarks to protect the intellectual property (IP) of DL models against such model extraction attacks in a black-box setting. Unlike existing approaches, DynaMarks does not alter the training process of the original model but rather embeds watermark into a surrogate model by dynamically changing the output responses from the original model prediction API based on certain secret parameters at inference runtime. The experimental outcomes on Fashion MNIST, CIFAR-10, and ImageNet datasets demonstrate the efficacy of DynaMarks scheme to watermark surrogate models while preserving the accuracies of the original models deployed in edge devices. In addition, we also perform experiments to evaluate the robustness of DynaMarks against various watermark removal strategies, thus allowing a DL model owner to reliably prove model ownership.

下载PDF全文

下载文献需遵守相关版权规定

论文标题