在神经网络中删除基于后门的水印，数据有限

论文标题

在神经网络中删除基于后门的水印，数据有限

Removing Backdoor-Based Watermarks in Neural Networks with Limited Data

论文作者

Liu, Xuankai, Li, Fengting, Wen, Bihan, Li, Qi

论文摘要

深层神经网络已在各个领域广泛应用并取得了巨大的成功。由于训练深层模型通常会消耗大量数据和计算资源，如今，经过培训的深层模型进行了高度要求和有利可图。不幸的是，幼稚的交易计划通常涉及与版权和可信度问题有关的潜在风险，例如，出售的模型可以非法转交给他人，而无需进一步授权以获得巨额利润。为了解决这个问题，提出了各种水印技术来保护模型的知识产权，其中基于后门的水印是最常用的水印。但是，这些水印方法的鲁棒性在现实的环境下无法很好地评估，例如分布数据可用性有限和水印模式的不可知论。在本文中，我们基于水印的鲁棒性，并使用有限的数据（称为Wild）提出了一种新型的基于后门的水印框架。拟议的野生野外只使用一小部分训练数据去除了深层模型的水印，并且输出模型可以与从头开始训练而没有注入水印的模型相同。特别是，一种新型的数据增强方法用于模仿水印触发器的行为。结合特征空间中正常和扰动（例如，被遮挡）数据之间的分布对齐，我们的方法在所有典型类型的触发物质上都很好地概括了。实验结果表明，我们的方法可以有效地消除水印，而不会损害对原始任务的深层模型性能，而对训练数据的访问有限。

Deep neural networks have been widely applied and achieved great success in various fields. As training deep models usually consumes massive data and computational resources, trading the trained deep models is highly demanded and lucrative nowadays. Unfortunately, the naive trading schemes typically involves potential risks related to copyright and trustworthiness issues, e.g., a sold model can be illegally resold to others without further authorization to reap huge profits. To tackle this problem, various watermarking techniques are proposed to protect the model intellectual property, amongst which the backdoor-based watermarking is the most commonly-used one. However, the robustness of these watermarking approaches is not well evaluated under realistic settings, such as limited in-distribution data availability and agnostic of watermarking patterns. In this paper, we benchmark the robustness of watermarking, and propose a novel backdoor-based watermark removal framework using limited data, dubbed WILD. The proposed WILD removes the watermarks of deep models with only a small portion of training data, and the output model can perform the same as models trained from scratch without watermarks injected. In particular, a novel data augmentation method is utilized to mimic the behavior of watermark triggers. Combining with the distribution alignment between the normal and perturbed (e.g., occluded) data in the feature space, our approach generalizes well on all typical types of trigger contents. The experimental results demonstrate that our approach can effectively remove the watermarks without compromising the deep model performance for the original task with the limited access to training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题