利用网络属性检测错误输入

论文标题

利用网络属性检测错误输入

Utilizing Network Properties to Detect Erroneous Inputs

论文作者

Gorbett, Matt, Blanchard, Nathaniel

论文摘要

神经网络很容易受到广泛的错误输入，例如对抗性，损坏，分发出来和错误分类的示例。在这项工作中，我们使用预训练的神经网络的隐藏和SoftMax特征向量来训练线性SVM分类器，以检测这四种错误的数据。我们的结果表明，这些错误的数据类型通常与正确的示例具有线性可分离的激活属性，使我们能够拒绝没有额外训练或开销的不良输入。我们在实验中验证了各种数据集，域，预训练模型和对抗性攻击的发现。

Neural networks are vulnerable to a wide range of erroneous inputs such as adversarial, corrupted, out-of-distribution, and misclassified examples. In this work, we train a linear SVM classifier to detect these four types of erroneous data using hidden and softmax feature vectors of pre-trained neural networks. Our results indicate that these faulty data types generally exhibit linearly separable activation properties from correct examples, giving us the ability to reject bad inputs with no extra training or overhead. We experimentally validate our findings across a diverse range of datasets, domains, pre-trained models, and adversarial attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题