对抗性和自然分配鲁棒性之间的明确权衡

论文标题

对抗性和自然分配鲁棒性之间的明确权衡

Explicit Tradeoffs between Adversarial and Natural Distributional Robustness

论文作者

Moayeri, Mazda, Banihashem, Kiarash, Feizi, Soheil

论文摘要

几项现有的作品分别研究深神经网络的对抗性或自然分布鲁棒性。但是，实际上，模型需要享受两种类型的鲁棒性，以确保可靠性。在这项工作中，我们弥合了这一差距，并表明实际上，对抗性和自然分布鲁棒性之间存在明确的权衡。我们首先考虑具有与核心和虚假功能不相交的高斯数据上的简单线性回归设置。在这种情况下，通过理论和经验分析，我们表明（i）使用$ \ ell_1 $和$ \ ell_2 $ NORMS进行的对抗性培训增加了对虚假功能的模型依赖；（ii）对于$ \ ell_ \ infty $ versarial训练，仅在伪造功能的规模大于核心功能的规模时才会发生虚假依赖；（iii）对抗训练可能会在降低分布鲁棒性方面具有意想不到的后果，特别是当新测试域中更改伪造的相关性时。接下来，我们使用在五个基准数据集（ObjectNet，Rival10，rival10，Siperient Imagenet-1M，Imagenet-9，Waterbirds）上评估的20个受对抗性训练的模型的测试套件，介绍了广泛的经验证据，这些模型依赖于对抗性训练的分类器，依赖于他们的标准训练有素的对应者，而不是对我们的标准培训的对应者进行了效果，这些分类器比我们的标准训练有素的对象更依赖于我们的效果。我们还表明，训练数据中的虚假相关性（保留在测试域中）可以改善对抗性的鲁棒性，表明先前的主张是对对抗性脆弱性植根于虚假相关性是不完整的。

Several existing works study either adversarial or natural distributional robustness of deep neural networks separately. In practice, however, models need to enjoy both types of robustness to ensure reliability. In this work, we bridge this gap and show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness. We first consider a simple linear regression setting on Gaussian data with disjoint sets of core and spurious features. In this setting, through theoretical and empirical analysis, we show that (i) adversarial training with $\ell_1$ and $\ell_2$ norms increases the model reliance on spurious features; (ii) For $\ell_\infty$ adversarial training, spurious reliance only occurs when the scale of the spurious features is larger than that of the core features; (iii) adversarial training can have an unintended consequence in reducing distributional robustness, specifically when spurious correlations are changed in the new test domain. Next, we present extensive empirical evidence, using a test suite of twenty adversarially trained models evaluated on five benchmark datasets (ObjectNet, RIVAL10, Salient ImageNet-1M, ImageNet-9, Waterbirds), that adversarially trained classifiers rely on backgrounds more than their standardly trained counterparts, validating our theoretical results. We also show that spurious correlations in training data (when preserved in the test domain) can improve adversarial robustness, revealing that previous claims that adversarial vulnerability is rooted in spurious correlations are incomplete.

下载PDF全文

下载文献需遵守相关版权规定

论文标题