论文标题
通过半监督的鲁棒训练朝着对抗性鲁棒
Toward Adversarial Robustness via Semi-supervised Robust Training
论文作者
论文摘要
对抗性例子已被证明是对深神经网络(DNN)的严重威胁。最有效的对抗防御方法之一是通过最大程度地减少对抗性风险$ r_ {adv} $的对抗训练(AT),它鼓励了良性示例$ x $及其对抗性扰动的社区,$ \ ell_ {p} $ - ball ball ball ball ball ball of Groams ball of Ground-Ball ball of Grount the Grount-theabluth baluth bailel baluth bailel ball。在这项工作中,我们提出了一种新颖的防御方法,可靠的训练(RT),共同降低了两个分开的风险($ r_ {stand} $和$ r_ {rob} $),这是关于良性示例及其社区的。动机是明确,共同增强准确性和对抗性鲁棒性。我们证明$ r_ {adv} $是由$ r_ {stand} + r_ {rob} $限制的,这意味着rt的效果与at。直观地,将标准风险最小化可以正确预测良性示例,并且强大的风险最小化鼓励邻居实例的预测与良性示例的预测一致。此外,由于$ r_ {rob} $独立于地面真相标签,因此RT自然扩展到半监督模式($ $,即$,srt),以进一步增强对抗性鲁棒性。此外,我们将$ \ ell_ {p} $ - 有限的邻域扩展到一般情况,该案例涵盖了不同类型的扰动,例如像素($ $,$,$ x +δ$)或空间扰动($ $ x +δ$)($ $,即$,$ ax + b $)。基准数据集上的广泛实验不仅验证了所提出的SRT方法的优越性,该方法分别验证了针对像素或空间扰动的最先进方法,而且还证明了其对同时扰动的稳健性。重现主要结果的代码可在\ url {https://github.com/thuyimingli/semi-semi-supervise_robust_training}获得。
Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + δ$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{https://github.com/THUYimingLi/Semi-supervised_Robust_Training}.