论文标题
捍卫分布式分类器免受数据中毒攻击
Defending Distributed Classifiers Against Data Poisoning Attacks
论文作者
论文摘要
支持向量机(SVM)容易受到有针对性的培训数据操作的影响,例如中毒攻击和标签翻转。通过仔细操纵训练样本的一部分,攻击者迫使学习者计算错误的决策边界,从而导致错误分类。考虑到SVM在工程和至关重要的应用中的重要性提高,我们开发了一种新颖的防御算法,可以提高对这种攻击的抵抗力。局部内在维度(LID)是一个有希望的度量标准,它表征了数据样本的异常性。在这项工作中,我们引入了一种称为k-lid的盖子的新近似值,该盖子在盖子计算中使用核距离,该距离允许在高尺寸转换的空间中计算盖子。我们引入了使用K-LID的加权SVM,以此作为一种显着性的特征,该特征强调可疑数据样本对SVM决策边界的影响。每个样品的k-lid值来自良性k-lid分布而不是攻击的k-lid分布的加权。然后,我们演示了如何通过基于SDR的监视系统的案例研究将提出的防御应用于分布式SVM框架。使用基准数据集的实验表明,所提出的防御可降低分类错误率(平均为10%)。
Support Vector Machines (SVMs) are vulnerable to targeted training data manipulations such as poisoning attacks and label flips. By carefully manipulating a subset of training samples, the attacker forces the learner to compute an incorrect decision boundary, thereby cause misclassifications. Considering the increased importance of SVMs in engineering and life-critical applications, we develop a novel defense algorithm that improves resistance against such attacks. Local Intrinsic Dimensionality (LID) is a promising metric that characterizes the outlierness of data samples. In this work, we introduce a new approximation of LID called K-LID that uses kernel distance in the LID calculation, which allows LID to be calculated in high dimensional transformed spaces. We introduce a weighted SVM against such attacks using K-LID as a distinguishing characteristic that de-emphasizes the effect of suspicious data samples on the SVM decision boundary. Each sample is weighted on how likely its K-LID value is from the benign K-LID distribution rather than the attacked K-LID distribution. We then demonstrate how the proposed defense can be applied to a distributed SVM framework through a case study on an SDR-based surveillance system. Experiments with benchmark data sets show that the proposed defense reduces classification error rates substantially (10% on average).