论文标题
对黑框模型的对抗特征攻击
Adversarial Eigen Attack on Black-Box Models
论文作者
论文摘要
Black-Box的对抗攻击吸引了许多研究兴趣,因为它在AI安全方面的实际使用。与白框攻击相比,与攻击模型相关的较少可用信息以及对查询预算的附加约束更加困难。提高攻击效率的一种一般方法是从预先训练的可转移白框模型中获得支持。在本文中,我们提出了一种可转移的黑盒攻击的新颖设置:攻击者可以使用具有可用网络参数的预训练模型中的外部信息,但是,与以前的研究不同,不允许其他培训数据进一步更改或调整预训练的模型。为此,我们进一步提出了一种新的算法,Eigenba来解决这个问题。我们的方法旨在通过利用预先训练的白盒模型的雅各布矩阵来探索黑框模型的更多梯度信息,并提高攻击效率,同时保持对原始攻击图像的扰动。我们表明,最佳扰动与雅各布矩阵的正确单数矢量密切相关。对ImageNet和CIFAR-10的进一步实验表明,即使是未经培训的预训练的白盒模型也可以显着提高黑盒攻击的效率,而我们提出的方法也可以进一步提高攻击效率。
Black-box adversarial attack has attracted a lot of research interests for its practical use in AI safety. Compared with the white-box attack, a black-box setting is more difficult for less available information related to the attacked model and the additional constraint on the query budget. A general way to improve the attack efficiency is to draw support from a pre-trained transferable white-box model. In this paper, we propose a novel setting of transferable black-box attack: attackers may use external information from a pre-trained model with available network parameters, however, different from previous studies, no additional training data is permitted to further change or tune the pre-trained model. To this end, we further propose a new algorithm, EigenBA to tackle this problem. Our method aims to explore more gradient information of the black-box model, and promote the attack efficiency, while keeping the perturbation to the original attacked image small, by leveraging the Jacobian matrix of the pre-trained white-box model. We show the optimal perturbations are closely related to the right singular vectors of the Jacobian matrix. Further experiments on ImageNet and CIFAR-10 show that even the unlearnable pre-trained white-box model could also significantly boost the efficiency of the black-box attack and our proposed method could further improve the attack efficiency.