防御引导的可转移对抗性攻击

论文标题

防御引导的可转移对抗性攻击

Defense-guided Transferable Adversarial Attacks

论文作者

Zhang, Zifei, Qiao, Kai, Chen, Jian, Liang, Ningning

论文摘要

尽管深层神经网络非常有具有挑战性的任务，但它们容易受到对抗性示例的影响，这些示例会通过在清洁输入上应用人类侵蚀性的扰动来误导分类器。在无查询的黑盒方案下，很难将对抗性示例转移到未知模型中，并且已经提出了几种方法，其可传递性较低。为了解决此类问题，我们设计了一个受投入转换启发的最大敏捷框架，这对对抗性攻击和防御都很容易。明确地，我们以最小过程中的输入的优势转换为防御，将损失值降低，然后以最大过程中的动量迭代算法增加损失值作为攻击。为了进一步促进可传递性，我们通过最大值理论来确定变换的值。对ImageNet的广泛实验表明，我们的防御引导转移攻击可以使可转移性提高令人印象深刻。在实验上，我们表明，我们的对抗攻击的ASR平均达到58.38％，这在正常训练的模型上优于最先进的方法，在经过训练的模型上的表现为12.1％。此外，我们还提供了有关可转移性提高的明确见解，并且我们的方法有望成为评估深层模型鲁棒性的基准。

Though deep neural networks perform challenging tasks excellently, they are susceptible to adversarial examples, which mislead classifiers by applying human-imperceptible perturbations on clean inputs. Under the query-free black-box scenario, adversarial examples are hard to transfer to unknown models, and several methods have been proposed with the low transferability. To settle such issue, we design a max-min framework inspired by input transformations, which are benificial to both the adversarial attack and defense. Explicitly, we decrease loss values with inputs' affline transformations as a defense in the minimum procedure, and then increase loss values with the momentum iterative algorithm as an attack in the maximum procedure. To further promote transferability, we determine transformed values with the max-min theory. Extensive experiments on Imagenet demonstrate that our defense-guided transferable attacks achieve impressive increase on transferability. Experimentally, we show that our ASR of adversarial attack reaches to 58.38% on average, which outperforms the state-of-the-art method by 12.1% on the normally trained models and by 11.13% on the adversarially trained models. Additionally, we provide elucidative insights on the improvement of transferability, and our method is expected to be a benchmark for assessing the robustness of deep models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题