论文标题
最大程度地减少可转移黑盒目标攻击的最大模型差异
Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks
论文作者
论文摘要
在这项工作中,我们从模型差异的角度研究了黑框的目标攻击问题。在理论方面,我们提出了针对黑框目标攻击绑定的概括误差,该攻击对确保攻击成功进行了严格的理论分析。我们揭示目标模型的攻击误差主要取决于替代模型上的经验攻击误差以及替代模型之间的最大模型差异。在算法方面,我们根据理论分析得出了一种针对黑框有针对性攻击的新算法,当训练生成器生成对手实例时,我们还将替代模型的最大模型差异(M3D)最小化。通过这种方式,我们的模型能够制定可转移的对抗性示例,这些示例可与模型变化相关,从而提高了攻击黑盒模型的成功率。我们在具有不同分类模型的Imagenet数据集上进行了广泛的实验,我们提出的方法的表现优于现有的最新方法。我们的代码将发布。
In this work, we study the black-box targeted attack problem from the model discrepancy perspective. On the theoretical side, we present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack. We reveal that the attack error on a target model mainly depends on empirical attack error on the substitute model and the maximum model discrepancy among substitute models. On the algorithmic side, we derive a new algorithm for black-box targeted attacks based on our theoretical analysis, in which we additionally minimize the maximum model discrepancy(M3D) of the substitute models when training the generator to generate adversarial examples. In this way, our model is capable of crafting highly transferable adversarial examples that are robust to the model variation, thus improving the success rate for attacking the black-box model. We conduct extensive experiments on the ImageNet dataset with different classification models, and our proposed approach outperforms existing state-of-the-art methods by a significant margin. Our codes will be released.