论文标题
不平衡的梯度:高估对抗性鲁棒性的微妙原因
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness
论文作者
论文摘要
评估国防模型的鲁棒性是对抗性鲁棒性研究的一项艰巨任务。以前已经发现了混淆的梯度在许多防御方法中存在,并引起稳健性的错误信号。在本文中,我们确定了一种更微妙的情况,称为不平衡梯度,这也可能导致高估的对抗性鲁棒性。当保证金损失的一个项的梯度占主导地位并将攻击推向次优方向时,就会发生不平衡梯度的现象。为了利用不平衡的梯度,我们制定了边缘分解(MD)攻击,该攻击将保证金损失分解为单个术语,然后通过两个阶段的过程分别探索这些术语的攻击性。我们还提出了MD攻击的多目标和合奏版本。通过调查自2018年以来提出的24种防御模型,我们发现11个模型易受一定程度的不平衡梯度,而我们的MD攻击可以通过最佳独立基线攻击评估其鲁棒性,从而降低了1%以上。我们还对可能导致梯度不平衡和有效对策的可能原因进行了深入的调查。我们的代码可在https://github.com/hanxunh/mdattack上找到。
Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a multi-targeted and ensemble version of our MD attack. By investigating 24 defense models proposed since 2018, we find that 11 models are susceptible to a certain degree of imbalanced gradients and our MD attack can decrease their robustness evaluated by the best standalone baseline attack by more than 1%. We also provide an in-depth investigation on the likely causes of imbalanced gradients and effective countermeasures. Our code is available at https://github.com/HanxunH/MDAttack.