一种用于自动对抗攻击优化设计的多目标模因算法

论文标题

一种用于自动对抗攻击优化设计的多目标模因算法

A Multi-objective Memetic Algorithm for Auto Adversarial Attack Optimization Design

论文作者

Sun, Jialiang, Yao, Wen, Jiang, Tingsong, Chen, Xiaoqian

论文摘要

在变异场景中已经揭示了对抗性实例的现象。最近的研究表明，精心设计的对抗性防御策略可以改善深度学习模型针对对抗性例子的鲁棒性。但是，随着国防技术的快速发展，由于现有手动设计的对抗性攻击的性能较弱，因此很难评估防御模型的鲁棒性。为了应对挑战，鉴于防御模型，需要进一步利用有效的对抗性攻击，并以较小的计算负担和较低的稳健精度来解决。因此，我们为自动对抗攻击优化设计提出了一种多目标模因算法，该算法自动搜索了对防御模型的近乎最佳的对抗性攻击。首先，构建了自动对抗攻击优化设计的更通用的数学模型，其中搜索空间不仅包括攻击者操作，大小，迭代号和损失功能，还包括多个对抗性攻击的连接方式。此外，我们开发了一种组合NSGA-II和本地搜索以解决优化问题的多目标模因算法。最后，为了降低搜索过程中的评估成本，我们根据模型输出的每个图像的跨熵损失值的分类提出了代表性的数据选择策略。 CIFAR10，CIFAR100和Imagenet数据集的实验显示了我们提出的方法的有效性。

The phenomenon of adversarial examples has been revealed in variant scenarios. Recent studies show that well-designed adversarial defense strategies can improve the robustness of deep learning models against adversarial examples. However, with the rapid development of defense technologies, it also tends to be more difficult to evaluate the robustness of the defensed model due to the weak performance of existing manually designed adversarial attacks. To address the challenge, given the defensed model, the efficient adversarial attack with less computational burden and lower robust accuracy is needed to be further exploited. Therefore, we propose a multi-objective memetic algorithm for auto adversarial attack optimization design, which realizes the automatical search for the near-optimal adversarial attack towards defensed models. Firstly, the more general mathematical model of auto adversarial attack optimization design is constructed, where the search space includes not only the attacker operations, magnitude, iteration number, and loss functions but also the connection ways of multiple adversarial attacks. In addition, we develop a multi-objective memetic algorithm combining NSGA-II and local search to solve the optimization problem. Finally, to decrease the evaluation cost during the search, we propose a representative data selection strategy based on the sorting of cross entropy loss values of each images output by models. Experiments on CIFAR10, CIFAR100, and ImageNet datasets show the effectiveness of our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题