验证深度神经网络对语义扰动的鲁棒性的鲁棒性

论文标题

验证深度神经网络对语义扰动的鲁棒性的鲁棒性

Verifying Attention Robustness of Deep Neural Networks against Semantic Perturbations

论文作者

Munakata, Satoshi, Urban, Caterina, Yokoyama, Haruki, Yamamoto, Koji, Munakata, Kazuki

论文摘要

众所周知，深度神经网络（DNNS）通过特别注意某些特定像素来对输入图像进行分类。对每个像素的注意力的图形表示称为显着图。显着图用于检查分类决策基础的有效性，例如，如果DNN对背景而不是图像的主题更加注意，则它不是分类的有效基础。语义扰动可以显着改变显着性图。在这项工作中，我们提出了第一种注意鲁棒性的验证方法，即显着映射对语义扰动组合的局部稳健性。具体而言，我们的方法确定了扰动参数的范围（例如，亮度变化），该参数维持实际显着性映射变化与预期的显着图映射变化之间的差异低于给定的阈值。我们的方法基于激活区域遍历，重点是最外面的鲁棒边界，以在较大的DNN上可伸缩。实验结果表明，无论语义扰动如何，我们的方法都可以显示DNN可以与相同基础分类的程度，并报告激活区域遍历的性能和性能因素。

It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the background rather than the subject of an image. Semantic perturbations can significantly change the saliency-map. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between the actual saliency-map change and the expected saliency-map change below a given threshold value. Our method is based on activation region traversals, focusing on the outermost robust boundary for scalability on larger DNNs. Experimental results demonstrate that our method can show the extent to which DNNs can classify with the same basis regardless of semantic perturbations and report on performance and performance factors of activation region traversals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题