论文标题
通过嵌入数字水印来生成图像对抗性示例
Generating Image Adversarial Examples by Embedding Digital Watermarks
论文作者
论文摘要
随着对深度神经网络(DNN)模型的越来越关注,此类模型的攻击也即将发生。例如,攻击者可以以特定方式仔细构建图像(也称为对抗性示例),以误导DNN模型以输出不正确的分类结果。同样,提出许多努力来检测和减轻对抗性例子,通常是针对某些专门的攻击。在本文中,我们提出了一种基于数字水印的新型方法,以生成图像对抗性示例以愚弄DNN模型。具体而言,WaterMark图像的部分主要特征几乎被视而不见地嵌入到主机图像中,旨在篡改和损害DNN模型的识别能力。我们设计了一种有效的机制,可以选择主机图像和水印图像,并利用改进的基于一组有效的超参数的基于改进的基于拼布的小波变换(DWT)水印算法,以将水印图像数据集的数字水印嵌入到原始图像中,以生成图像对手示例。实验结果表明,对COFAR-10数据集的攻击成功率平均达到95.47%,最高为98.71%。此外,我们的方案能够有效地生成大量的对抗示例,具体地,平均完成了1.17秒,以完成CIFAR-100数据集中每个图像的攻击。此外,我们设计了一个基线实验,它使用高斯噪声(Gaussian Noise)作为水印图像数据集生成的水印图像,该数据集也显示了我们方案的有效性。同样,我们还提出了基于修改的离散余弦变换(DCT)的拼布水印算法。为了确保可重复性和可重复性,可以在GitHub上获得源代码。
With the increasing attention to deep neural network (DNN) models, attacks are also upcoming for such models. For example, an attacker may carefully construct images in specific ways (also referred to as adversarial examples) aiming to mislead the DNN models to output incorrect classification results. Similarly, many efforts are proposed to detect and mitigate adversarial examples, usually for certain dedicated attacks. In this paper, we propose a novel digital watermark-based method to generate image adversarial examples to fool DNN models. Specifically, partial main features of the watermark image are embedded into the host image almost invisibly, aiming to tamper with and damage the recognition capabilities of the DNN models. We devise an efficient mechanism to select host images and watermark images and utilize the improved discrete wavelet transform (DWT) based Patchwork watermarking algorithm with a set of valid hyperparameters to embed digital watermarks from the watermark image dataset into original images for generating image adversarial examples. The experimental results illustrate that the attack success rate on common DNN models can reach an average of 95.47% on the CIFAR-10 dataset and the highest at 98.71%. Besides, our scheme is able to generate a large number of adversarial examples efficiently, concretely, an average of 1.17 seconds for completing the attacks on each image on the CIFAR-10 dataset. In addition, we design a baseline experiment using the watermark images generated by Gaussian noise as the watermark image dataset that also displays the effectiveness of our scheme. Similarly, we also propose the modified discrete cosine transform (DCT) based Patchwork watermarking algorithm. To ensure repeatability and reproducibility, the source code is available on GitHub.