论文标题
深度分区聚合:可证明防御一般中毒攻击的防御
Deep Partition Aggregation: Provable Defense against General Poisoning Attacks
论文作者
论文摘要
对抗中毒攻击扭曲训练数据,以破坏分类器的测试时间行为。可证明的防御提供了每个测试样本的证书,这是训练集的任何对抗性扭曲的大小,可能会破坏测试样本的分类。我们提出了两种新型可证明的防御中毒攻击的防御措施:(i)对一般中毒威胁模型的认证防御辩护,定义为对训练集的插入或删除的插入或删除,这是暗示的,该威胁模型还包括对图像和///或Babels的界限数量的任意扭曲; (ii)半监督的DPA(SS-DPA),这是针对贴标签的中毒攻击的认证辩护。 DPA是一种合奏方法,在该方法中,对基本模型进行了对由哈希功能确定的训练集的分区进行训练。 DPA与子集聚合有关,这是一种经典的机器学习中经过充分研究的合奏方法,也与随机平滑合奏有关,这是一种受欢迎的可证明的防御逃避攻击的防御。我们针对贴标签的攻击的防御SS-DPA使用半监督的学习算法作为其基本分类器模型:每个基本分类器均经过整个未标记的培训集对分区的标签进行培训。 SS-DPA明显胜过现有的对MNIST和CIFAR-10的标签式攻击的认证防御:对于至少一半的测试图像,可耐受性耐受性,超过600个对MNIST和300个标签翻转和超过300个标签的标签(Vs. 175 Label Flips)(Vs. 175 Label Flips)(vs.175 Label Flips)(vs.175 Label Flips)。针对一般中毒攻击,在没有事先认证的防御措施的情况下,DPA可以证明> = 50%的测试图像,以超过500毒图像插入MNIST,而在CIFAR-10上进行了9次插入。这些结果建立了针对中毒攻击的新最先进的可证明防御能力。
Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier. A provable defense provides a certificate for each test sample, which is a lower bound on the magnitude of any adversarial distortion of the training set that can corrupt the test sample's classification. We propose two novel provable defenses against poisoning attacks: (i) Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, defined as the insertion or deletion of a bounded number of samples to the training set -- by implication, this threat model also includes arbitrary distortions to a bounded number of images and/or labels; and (ii) Semi-Supervised DPA (SS-DPA), a certified defense against label-flipping poisoning attacks. DPA is an ensemble method where base models are trained on partitions of the training set determined by a hash function. DPA is related to both subset aggregation, a well-studied ensemble method in classical machine learning, as well as to randomized smoothing, a popular provable defense against evasion attacks. Our defense against label-flipping attacks, SS-DPA, uses a semi-supervised learning algorithm as its base classifier model: each base classifier is trained using the entire unlabeled training set in addition to the labels for a partition. SS-DPA significantly outperforms the existing certified defense for label-flipping attacks on both MNIST and CIFAR-10: provably tolerating, for at least half of test images, over 600 label flips (vs. < 200 label flips) on MNIST and over 300 label flips (vs. 175 label flips) on CIFAR-10. Against general poisoning attacks, where no prior certified defenses exists, DPA can certify >= 50% of test images against over 500 poison image insertions on MNIST, and nine insertions on CIFAR-10. These results establish new state-of-the-art provable defenses against poisoning attacks.