论文标题
对抗性概况:检测预训练的CNN中的分布和对抗样本
Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs
论文作者
论文摘要
尽管卷积神经网络(CNN)的准确性很高,但它们容易受到对抗和外部示例的影响。有许多提出的方法倾向于检测或使CNN在这些愚蠢的例子上强大。但是,大多数这样的方法需要访问广泛的愚蠢示例,以重新训练网络或调整检测参数。在这里,我们提出了一种针对预训练的CNN检测对抗性和外部示例的方法,而无需重新训练CNN或需要访问各种愚蠢的例子。为此,我们仅使用一种对抗性攻击生成技术为每个班级创建对抗性概况。然后,我们将检测器包裹在预训练的CNN周围,该检测器将创建的对抗配置文件应用于每个输入,并使用输出来决定输入是否合法。我们使用MNIST数据集对这种方法的初步评估表明,基于对抗性的检测有效检测至少92个外部示例和59%的对抗性示例。
Despite high accuracy of Convolutional Neural Networks (CNNs), they are vulnerable to adversarial and out-distribution examples. There are many proposed methods that tend to detect or make CNNs robust against these fooling examples. However, most such methods need access to a wide range of fooling examples to retrain the network or to tune detection parameters. Here, we propose a method to detect adversarial and out-distribution examples against a pre-trained CNN without needing to retrain the CNN or needing access to a wide variety of fooling examples. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. We then wrap a detector around the pre-trained CNN that applies the created adversarial profile to each input and uses the output to decide whether or not the input is legitimate. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.