论文标题
迷宫:使用零订单梯度估计的无数据模型窃取攻击
MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation
论文作者
论文摘要
模型窃取(MS)攻击允许对对手进行黑框访问机器学习模型,以复制其功能,从而损害了模型的机密性。这种攻击通过使用目标模型的不同输入的预测来训练克隆模型。此类攻击的有效性在很大程度上取决于查询目标模型所需的数据的可用性。现有攻击要么假设对目标模型数据集的部分访问或具有语义相似性的备用数据集的可用性。本文提出了迷宫 - 使用零阶梯度估计的无数据模型窃取攻击。与先前的工作相反,迷宫不需要任何数据,而是使用生成模型创建合成数据。受到最新无数据知识蒸馏(KD)作品的启发,我们使用分歧目标训练生成模型,以产生最大化克隆和目标模型之间分歧的输入。但是,与可获得梯度信息的KD的白色框设置不同,训练生成器进行模型窃取需要执行黑盒优化,因为它涉及访问攻击下的目标模型。迷宫依赖于零阶梯度估计来执行此优化,并实现高度准确的MS攻击。我们使用四个数据集的评估表明,迷宫提供了0.91倍至0.99倍的归一化克隆精度,甚至超过了依赖部分数据(JBDA,克隆精度为0.13倍至0.69倍)的最近攻击,并胜过替代数据(aggneoffnets,克隆精度为0.52x至0.97x)。我们还研究了部分数据设置中迷宫的扩展,并开发迷宫-PD,该迷宫PD会产生更接近目标分布的合成数据。 Maze-PD进一步提高了克隆精度(0.97倍至1.0倍),并将攻击所需的查询减少2x-24x。
Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.