RAB：可证明对后门攻击的鲁棒性

论文标题

RAB：可证明对后门攻击的鲁棒性

RAB: Provable Robustness Against Backdoor Attacks

论文作者

Weber, Maurice, Xu, Xiaojun, Karlaš, Bojan, Zhang, Ce, Li, Bo

论文摘要

最近的研究表明，深度神经网络（DNN）容易受到对抗性攻击的影响，包括逃避和后门（中毒）攻击。在防御方面，已经在改善逃避袭击的经验和可证明的鲁棒性方面做出了强烈的努力。但是，可证明的针对后门攻击的鲁棒性仍然在很大程度上尚未探索。在本文中，我们专注于证明机器学习模型的鲁棒性针对一般威胁模型，尤其是后门攻击。我们首先通过随机平滑技术提供一个统一的框架，并显示如何对其进行实例化以证明对逃避和后门攻击的鲁棒性。然后，我们提出了第一个强大的训练过程RAB，以使训练有素的模型平滑并证明其在后门攻击方面的鲁棒性。我们证明了对经过RAB训练的机器学习模型的稳健性，并证明我们的稳健性界限很紧。此外，我们从理论上表明，可以为诸如K-Nearest邻居分类器之类的简单模型有效地训练可靠的平滑模型，并且我们提出了一种精确的平滑训练算法，以消除此类模型噪声分布中采样的需求。从经验上讲，我们为不同的机器学习（ML）模型进行了全面的实验，例如DNN，支持向量机和MNIST，CIFAR-10和Imagenette数据集的K-NN模型，并为抗后门攻击的认证鲁棒性提供了第一个基准。此外，我们在Spambase表格数据集上评估K-NN模型，以证明所提出的精确算法的优势。理论分析和对各种ML模型和数据集的全面评估都阐明了针对一般训练时间攻击的进一步强大的学习策略。

Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题