论文标题
对抗性并发培训:优化深度神经网络的鲁棒性和准确性权衡
Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks
论文作者
论文摘要
事实证明,对抗训练是改善模型对抗性鲁棒性的有效技术。但是,在优化模型以提高准确性和鲁棒性之间似乎是一个内在的权衡。为此,我们提出了对抗性并发培训(ACT),该培训(ACT)在协作学习框架中采用了对抗性培训,我们通过在Minimax游戏中与自然模型一起培训了强大的模型。 ACT鼓励这两个模型通过使用特定于任务的决策边界来对齐其功能空间,并更广泛地探索输入空间。此外,自然模型充当正规器,在强大的模型应该学习的功能上执行先验。我们对模型行为的分析表明,ACT会导致一个具有较低模型复杂性,较高信息压缩的强大模型,并且在学习表示中的较高信息压缩以及高后熵解决方案,表明融合了趋于最小的最小值。我们证明了跨不同数据集和网络体系结构的拟议方法的有效性。在Imagenet上,ACT在100次未定位的攻击下获得了68.20%的标准精度和44.29%的鲁棒性精度,从而改善了标准对抗训练方法的65.70%的标准准确性和42.36%的鲁棒性。
Adversarial training has been proven to be an effective technique for improving the adversarial robustness of models. However, there seems to be an inherent trade-off between optimizing the model for accuracy and robustness. To this end, we propose Adversarial Concurrent Training (ACT), which employs adversarial training in a collaborative learning framework whereby we train a robust model in conjunction with a natural model in a minimax game. ACT encourages the two models to align their feature space by using the task-specific decision boundaries and explore the input space more broadly. Furthermore, the natural model acts as a regularizer, enforcing priors on features that the robust model should learn. Our analyses on the behavior of the models show that ACT leads to a robust model with lower model complexity, higher information compression in the learned representations, and high posterior entropy solutions indicative of convergence to a flatter minima. We demonstrate the effectiveness of the proposed approach across different datasets and network architectures. On ImageNet, ACT achieves 68.20% standard accuracy and 44.29% robustness accuracy under a 100-iteration untargeted attack, improving upon the standard adversarial training method's 65.70% standard accuracy and 42.36% robustness.