论文标题
深度特征分布的可转移扰动
Transferable Perturbations of Deep Feature Distributions
论文作者
论文摘要
CNN分类器几乎所有当前的对抗攻击都依赖于网络输出层得出的信息。这项工作基于对班级和层面深度特征分布的建模和开发提出了新的对抗性攻击。我们为未防御的成像网模型实现了最新的目标黑框转移攻击结果。此外,我们优先考虑攻击过程的解释性和解释性。我们的方法提供了分析对抗性攻击如何改变CNN的中间特征分布,以及层面和阶级特征分布分布性/纠缠的度量。我们还概念化了从任务/数据特异性的过渡到CNN体系结构中直接影响对抗性示例的可传递性的特定特定特征。
Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-specific to model-specific features within a CNN architecture that directly impacts the transferability of adversarial examples.