接近和不确定性意识到对抗机器学习中的对抗性示例检测

论文标题

接近和不确定性意识到对抗机器学习中的对抗性示例检测

Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning

论文作者

Tuna, Omer Faruk, Catak, Ferhat Ozgur, Eskil, M. Taner

论文摘要

虽然最新的深神经网络（DNN）模型被认为对随机扰动是可靠的，但表明这些架构非常容易受到故意精心设计的扰动的影响，尽管是准意见。这些漏洞使在安全至关重要地区部署DNN模型变得具有挑战性。近年来，已经进行了许多研究来开发新的攻击方法，并提出了新的防御技术，以实现更健壮和可靠的模型。在这项工作中，我们探索并评估了不同类型的指标用于检测对抗样本的用法。我们首先利用了使用Monte-Carlo辍学抽样获得的DNN分类器的基于力矩的预测不确定性估计。我们还引入了一种新方法，该方法在模型提取的深度特征的子空间中运行。我们验证了方法对MNIST（数字），MNIST（时尚）和CIFAR-10等一系列标准数据集的有效性。我们的实验表明，这两种不同的方法相互补充，并且所有提出的指标的总用法无关攻击算法的99 \％ROC-AUC分数产生。

While state-of-the-art Deep Neural Network (DNN) models are considered to be robust to random perturbations, it was shown that these architectures are highly vulnerable to deliberately crafted perturbations, albeit being quasi-imperceptible. These vulnerabilities make it challenging to deploy DNN models in security-critical areas. In recent years, many research studies have been conducted to develop new attack methods and come up with new defense techniques that enable more robust and reliable models. In this work, we explore and assess the usage of different type of metrics for detecting adversarial samples. We first leverage the usage of moment-based predictive uncertainty estimates of a DNN classifier obtained using Monte-Carlo Dropout Sampling. And we also introduce a new method that operates in the subspace of deep features extracted by the model. We verified the effectiveness of our approach on a range of standard datasets like MNIST (Digit), MNIST (Fashion) and CIFAR-10. Our experiments show that these two different approaches complement each other, and the combined usage of all the proposed metrics yields up to 99 \% ROC-AUC scores regardless of the attack algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题