深度分区聚合：可证明防御一般中毒攻击的防御

论文标题

深度分区聚合：可证明防御一般中毒攻击的防御

Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

论文作者

Levine, Alexander, Feizi, Soheil

论文摘要

对抗中毒攻击扭曲训练数据，以破坏分类器的测试时间行为。可证明的防御提供了每个测试样本的证书，这是训练集的任何对抗性扭曲的大小，可能会破坏测试样本的分类。我们提出了两种新型可证明的防御中毒攻击的防御措施：（i）对一般中毒威胁模型的认证防御辩护，定义为对训练集的插入或删除的插入或删除，这是暗示的，该威胁模型还包括对图像和///或Babels的界限数量的任意扭曲；（ii）半监督的DPA（SS-DPA），这是针对贴标签的中毒攻击的认证辩护。 DPA是一种合奏方法，在该方法中，对基本模型进行了对由哈希功能确定的训练集的分区进行训练。 DPA与子集聚合有关，这是一种经典的机器学习中经过充分研究的合奏方法，也与随机平滑合奏有关，这是一种受欢迎的可证明的防御逃避攻击的防御。我们针对贴标签的攻击的防御SS-DPA使用半监督的学习算法作为其基本分类器模型：每个基本分类器均经过整个未标记的培训集对分区的标签进行培训。 SS-DPA明显胜过现有的对MNIST和CIFAR-10的标签式攻击的认证防御：对于至少一半的测试图像，可耐受性耐受性，超过600个对MNIST和300个标签翻转和超过300个标签的标签（Vs. 175 Label Flips）（Vs. 175 Label Flips）（vs.175 Label Flips）（vs.175 Label Flips）。针对一般中毒攻击，在没有事先认证的防御措施的情况下，DPA可以证明> = 50％的测试图像，以超过500毒图像插入MNIST，而在CIFAR-10上进行了9次插入。这些结果建立了针对中毒攻击的新最先进的可证明防御能力。

Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier. A provable defense provides a certificate for each test sample, which is a lower bound on the magnitude of any adversarial distortion of the training set that can corrupt the test sample's classification. We propose two novel provable defenses against poisoning attacks: (i) Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, defined as the insertion or deletion of a bounded number of samples to the training set -- by implication, this threat model also includes arbitrary distortions to a bounded number of images and/or labels; and (ii) Semi-Supervised DPA (SS-DPA), a certified defense against label-flipping poisoning attacks. DPA is an ensemble method where base models are trained on partitions of the training set determined by a hash function. DPA is related to both subset aggregation, a well-studied ensemble method in classical machine learning, as well as to randomized smoothing, a popular provable defense against evasion attacks. Our defense against label-flipping attacks, SS-DPA, uses a semi-supervised learning algorithm as its base classifier model: each base classifier is trained using the entire unlabeled training set in addition to the labels for a partition. SS-DPA significantly outperforms the existing certified defense for label-flipping attacks on both MNIST and CIFAR-10: provably tolerating, for at least half of test images, over 600 label flips (vs. < 200 label flips) on MNIST and over 300 label flips (vs. 175 label flips) on CIFAR-10. Against general poisoning attacks, where no prior certified defenses exists, DPA can certify >= 50% of test images against over 500 poison image insertions on MNIST, and nine insertions on CIFAR-10. These results establish new state-of-the-art provable defenses against poisoning attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题