确定图像处理技术（IPT）的顺序以检测对抗攻击

论文标题

确定图像处理技术（IPT）的顺序以检测对抗攻击

Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks

论文作者

Gupta, Kishor Datta, Akhtar, Zahid, Dasgupta, Dipankar

论文摘要

从对抗性示例中开发安全的机器学习模型是具有挑战性的，因为不断开发各种方法来产生对抗性攻击。在这项工作中，我们提出了一种进化方法，以自动确定用于检测恶意输入的图像处理技术序列（IPTS）。因此，我们首先使用了包括自适应攻击方法（在我们的防御上）的各种攻击方法来生成清洁数据集中的对抗样本。开发了基于遗传算法（GA）的检测框架以找到最佳IPT，其中最优性是通过不同的适应性度量估算的，例如欧几里得距离，熵损失，平均直方图，局部二进制模式和损失功能。原始图像和处理的图像之间的“图像差”用于提取特征，然后将其馈送到分类方案中，以确定输入样本是对手还是清洁。本文描述了我们的方法论，并使用了多个用几种对抗性攻击测试的数据集进行了实验。对于每个攻击类型和数据集，它都会生成唯一的IPT。一组在测试时间动态选择的IPT，可作为对抗攻击的过滤器。我们的经验实验表现出令人鼓舞的结果，表明该方法可以有效地用作任何AI模型的处理。

Developing secure machine learning models from adversarial examples is challenging as various methods are continually being developed to generate adversarial attacks. In this work, we propose an evolutionary approach to automatically determine Image Processing Techniques Sequence (IPTS) for detecting malicious inputs. Accordingly, we first used a diverse set of attack methods including adaptive attack methods (on our defense) to generate adversarial samples from the clean dataset. A detection framework based on a genetic algorithm (GA) is developed to find the optimal IPTS, where the optimality is estimated by different fitness measures such as Euclidean distance, entropy loss, average histogram, local binary pattern and loss functions. The "image difference" between the original and processed images is used to extract the features, which are then fed to a classification scheme in order to determine whether the input sample is adversarial or clean. This paper described our methodology and performed experiments using multiple data-sets tested with several adversarial attacks. For each attack-type and dataset, it generates unique IPTS. A set of IPTS selected dynamically in testing time which works as a filter for the adversarial attack. Our empirical experiments exhibited promising results indicating the approach can efficiently be used as processing for any AI model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题