合奏生成性清洁与反馈回路，用于防御对抗攻击

论文标题

合奏生成性清洁与反馈回路，用于防御对抗攻击

Ensemble Generative Cleaning with Feedback Loops for Defending Adversarial Attacks

论文作者

Yuan, Jianhe, He, Zhihai

论文摘要

有效防御深层神经网络针对对抗性攻击仍然是一个具有挑战性的问题，尤其是在强大的白盒攻击下。在本文中，我们开发了一种新方法，称为“集合生成循环（EGC-FL”），以有效防御深层神经网络。提出的EGC-FL方法基于两个中心思想。首先，我们将变换后的Deadzone层引入防御网络，该层由正统变换和基于Deadzone的激活功能组成，以破坏对抗性攻击的复杂噪声模式。其次，通过构建具有反馈回路的生成清洁网络，我们能够生成对原始清洁图像的各种估计的合奏。然后，我们学习一个网络，将这组不同的估计融合在一起以恢复原始图像。我们广泛的实验结果表明，我们的方法在白色盒子和黑盒攻击中都可以通过大幅度的最先进。它在SVHN数据集上大大提高了对第二最佳方法的白色框PGD攻击的分类精度超过29％，而在具有挑战性的CIFAR-10数据集中，White-Box PGD攻击的准确性超过39％。

Effective defense of deep neural networks against adversarial attacks remains a challenging problem, especially under powerful white-box attacks. In this paper, we develop a new method called ensemble generative cleaning with feedback loops (EGC-FL) for effective defense of deep neural networks. The proposed EGC-FL method is based on two central ideas. First, we introduce a transformed deadzone layer into the defense network, which consists of an orthonormal transform and a deadzone-based activation function, to destroy the sophisticated noise pattern of adversarial attacks. Second, by constructing a generative cleaning network with a feedback loop, we are able to generate an ensemble of diverse estimations of the original clean image. We then learn a network to fuse this set of diverse estimations together to restore the original image. Our extensive experimental results demonstrate that our approach improves the state-of-art by large margins in both white-box and black-box attacks. It significantly improves the classification accuracy for white-box PGD attacks upon the second best method by more than 29% on the SVHN dataset and more than 39% on the challenging CIFAR-10 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题