使用一阶攻击方法重新考虑对对抗鲁棒性的经验评估

论文标题

使用一阶攻击方法重新考虑对对抗鲁棒性的经验评估

Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods

论文作者

Lee, Kyungmi, Chandrakasan, Anantha P.

论文摘要

我们确定了三个常见的案例，这些案例导致对有限的一阶攻击方法高估对抗精度，这通常用作经验研究中对抗性鲁棒性的替代性。对于每种情况，我们都提出补偿方法，以解决不准确的梯度计算来源，例如接近零的数值不稳定性和非差异性，或者通过近似二阶信息来减少迭代攻击的背部反向传播总数。这些补偿方法可以与现有的攻击方法结合使用，以获得更精确的经验评估度量。我们以实践兴趣的例子说明了这三种情况的影响，例如基准模型能力和正规化技术的鲁棒性。总体而言，我们的工作表明，即使对于常规训练的深度神经网络，也普遍存在的高估对抗性准确性也很普遍，并且突出显示使用经验评估而无需保证的界限。

We identify three common cases that lead to overestimation of adversarial accuracy against bounded first-order attack methods, which is popularly used as a proxy for adversarial robustness in empirical studies. For each case, we propose compensation methods that either address sources of inaccurate gradient computation, such as numerical instability near zero and non-differentiability, or reduce the total number of back-propagations for iterative attacks by approximating second-order information. These compensation methods can be combined with existing attack methods for a more precise empirical evaluation metric. We illustrate the impact of these three cases with examples of practical interest, such as benchmarking model capacity and regularization techniques for robustness. Overall, our work shows that overestimated adversarial accuracy that is not indicative of robustness is prevalent even for conventionally trained deep neural networks, and highlights cautions of using empirical evaluation without guaranteed bounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题