论文标题
女巫的啤酒:通过梯度匹配的工业规模数据中毒
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching
论文作者
论文摘要
数据中毒攻击修改训练数据以恶意控制接受此类数据的训练的模型。在这项工作中,我们专注于有针对性的中毒攻击,这些攻击会导致未修改的测试图像的重新分类以及这种违规模型的完整性。我们认为是一种特别恶意的中毒攻击,既是从头开始的”和“清洁标签”,这意味着我们分析了一种攻击,该攻击成功地针对新的,随机初始化的模型,而对人类几乎无法察觉,同时仅扰乱了一小部分培训数据。在这种环境中,对深度神经网络的中毒攻击在范围和成功方面受到限制,仅在简化的设置中工作,或者对于大型数据集的昂贵。新攻击的中心机制是与恶意实例的梯度方向相匹配。我们分析了为什么它有效,并补充了实际的考虑。并表现出对现实世界实践者的威胁,发现它是在从头开始在一个全尺寸中毒的Imagenet数据集中从头开始训练的现代深层网络中有针对性错误分类的中毒方法。最后,我们证明了现有的防御策略对这种攻击的局限性,得出结论,即使对于大规模的深度学习系统,数据中毒是可靠的威胁。
Data Poisoning attacks modify training data to maliciously control a model trained on such data. In this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a particularly malicious poisoning attack that is both "from scratch" and "clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. Previous poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. The central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.