评估对抗攻击的合奏鲁棒性

论文标题

评估对抗攻击的合奏鲁棒性

Evaluating Ensemble Robustness Against Adversarial Attacks

论文作者

Adam, George, Speciel, Romain

论文摘要

已知在模型之间传递的对抗性示例，其目的是愚弄神经网络的目的略有干扰的输入。有效的一种模型的对手通常会愚弄另一个模型。这种可转让性概念引起了严重的安全问题，因为它导致在黑匣子设置中攻击模型的可能性，在此期间，目标模型的内部参数未知。在本文中，我们试图分析和最大程度地减少合奏模型之间对手的可转移性。为此，我们介绍了一个基于梯度的度量，以衡量合奏的组成模型如何合作，以减少针对集合本身的对抗示例的空间。此外，我们证明可以在训练过程中使用此措施，以提高合奏对对抗性例子的鲁棒性。

Adversarial examples, which are slightly perturbed inputs generated with the aim of fooling a neural network, are known to transfer between models; adversaries which are effective on one model will often fool another. This concept of transferability poses grave security concerns as it leads to the possibility of attacking models in a black box setting, during which the internal parameters of the target model are unknown. In this paper, we seek to analyze and minimize the transferability of adversaries between models within an ensemble. To this end, we introduce a gradient based measure of how effectively an ensemble's constituent models collaborate to reduce the space of adversarial examples targeting the ensemble itself. Furthermore, we demonstrate that this measure can be utilized during training as to increase an ensemble's robustness to adversarial examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题