通过稳健表示匹配来传递对抗性鲁棒性

论文标题

通过稳健表示匹配来传递对抗性鲁棒性

Transferring Adversarial Robustness Through Robust Representation Matching

论文作者

Vaishnavi, Pratik, Eykholt, Kevin, Rahmati, Amir

论文摘要

随着机器学习的广泛使用，对其安全性和可靠性的担忧已经普遍存在。因此，许多人已经开发了防御能力来使神经网络与对抗性示例相抵触，这是可靠地错误分类的无知的扰动输入。对抗性训练在训练中产生和使用的对抗性例子是能够可靠地承受这种对神经网络的攻击的少数已知防御措施之一。但是，对抗性训练会施加大量的训练开销，并在模型复杂性和输入维度下缩放较差。在本文中，我们提出了稳健的表示匹配（RRM），这是一种低成本方法，是将经过对抗训练的模型的鲁棒性转移到一个新模型的鲁棒性，无论是在建筑差异的情况下都接受了相同任务的新模型。受学生教师学习的启发，我们的方法引入了一种新颖的培训损失，鼓励学生学习老师的强大表现。与先前的工作相比，RRM在模型性能和对抗性训练时间方面都很出色。在CIFAR-10上，RRM训练了强大的型号$ \ sim 1.8 \ times $ $比最先进的速度快。此外，RRM在高维数据集中仍然有效。在限制性imagenet上，RRM训练Resnet50型号$ \ sim 18 \ times $ $ $比标准的对抗训练快。

With the widespread use of machine learning, concerns over its security and reliability have become prevalent. As such, many have developed defenses to harden neural networks against adversarial examples, imperceptibly perturbed inputs that are reliably misclassified. Adversarial training in which adversarial examples are generated and used during training is one of the few known defenses able to reliably withstand such attacks against neural networks. However, adversarial training imposes a significant training overhead and scales poorly with model complexity and input dimension. In this paper, we propose Robust Representation Matching (RRM), a low-cost method to transfer the robustness of an adversarially trained model to a new model being trained for the same task irrespective of architectural differences. Inspired by student-teacher learning, our method introduces a novel training loss that encourages the student to learn the teacher's robust representations. Compared to prior works, RRM is superior with respect to both model performance and adversarial training time. On CIFAR-10, RRM trains a robust model $\sim 1.8\times$ faster than the state-of-the-art. Furthermore, RRM remains effective on higher-dimensional datasets. On Restricted-ImageNet, RRM trains a ResNet50 model $\sim 18\times$ faster than standard adversarial training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题