SSL-WM：通过自我监督学习预先培训的编码器的黑盒水印方法

论文标题

SSL-WM：通过自我监督学习预先培训的编码器的黑盒水印方法

SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-supervised Learning

论文作者

Lv, Peizhuo, Li, Pan, Zhu, Shenchen, Zhang, Shengzhi, Chen, Kai, Liang, Ruigang, Yue, Chang, Xiang, Fan, Cai, Yuling, Ma, Hualong, Zhang, Yingjun, Meng, Guozhu

论文摘要

近年来，在自学学习（SSL）方面取得了巨大的成功，该学习已被广泛用于促进计算机视觉（CV）和自然语言处理（NLP）领域的各种下游任务。但是，攻击者可能会窃取此类SSL模型并将其商业化以获利，这使得验证SSL模型的所有权至关重要。大多数现有的所有权保护解决方案（例如，基于后门的水印）都是为监督学习模型而设计的，并且不能直接使用，因为它们要求模型的下游任务和目标标签在水印嵌入过程中已知并可用，这在SSL的范围内并不总是可能。为了解决这样的问题，尤其是在水印嵌入过程中下游任务多样且未知时，我们提出了一种新型的黑盒水印解决方案，名为SSL-WM，用于验证SSL模型的所有权。 SSL-WM将保护编码器的水印输入映射到不变的表示空间中，该空间会导致任何下游分类器产生预期的行为，从而允许检测嵌入的水印。我们使用基于对比的和基于生成的不同SSL模型在许多任务（例如CV和NLP）上评估了SSL-WM。实验结果表明，SSL-WM可以有效地验证各种下游任务中被盗SSL模型的所有权。此外，SSL-WM与模型进行微调，修剪和输入预处理攻击非常强大。最后，SSL-WM还可以从评估的水印检测方法中逃避检测，以证明其在保护SSL模型所有权方面有希望的应用。

Recent years have witnessed tremendous success in Self-Supervised Learning (SSL), which has been widely utilized to facilitate various downstream tasks in Computer Vision (CV) and Natural Language Processing (NLP) domains. However, attackers may steal such SSL models and commercialize them for profit, making it crucial to verify the ownership of the SSL models. Most existing ownership protection solutions (e.g., backdoor-based watermarks) are designed for supervised learning models and cannot be used directly since they require that the models' downstream tasks and target labels be known and available during watermark embedding, which is not always possible in the domain of SSL. To address such a problem, especially when downstream tasks are diverse and unknown during watermark embedding, we propose a novel black-box watermarking solution, named SSL-WM, for verifying the ownership of SSL models. SSL-WM maps watermarked inputs of the protected encoders into an invariant representation space, which causes any downstream classifier to produce expected behavior, thus allowing the detection of embedded watermarks. We evaluate SSL-WM on numerous tasks, such as CV and NLP, using different SSL models both contrastive-based and generative-based. Experimental results demonstrate that SSL-WM can effectively verify the ownership of stolen SSL models in various downstream tasks. Furthermore, SSL-WM is robust against model fine-tuning, pruning, and input preprocessing attacks. Lastly, SSL-WM can also evade detection from evaluated watermark detection approaches, demonstrating its promising application in protecting the ownership of SSL models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题