论文标题
黑桃:分布不匹配下的半监督异常检测
SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch
论文作者
论文摘要
半监督异常检测是一个常见问题,因为包含异常的数据集经常被部分标记。我们提出了一个规范框架:半监督伪标记的异常检测(Spade)(Spade)不受标记和未标记数据的假设的限制。实际上,在许多应用程序中经常违反该假设 - 例如,标记的数据可能仅包含异常情况,而与未标记的数据不同,或者未标记的数据可能包含不同类型的异常,或者标记的数据可能仅包含“易于标记的”样本。 Spade利用一个类别分类器的合奏作为伪标记者来提高伪标记与分布不匹配的稳健性。提出了部分匹配,以自动选择无验证数据的伪标记的关键超参数,这对于有限的标记数据至关重要。在各种场景中,Spade显示了最新的半监督异常检测性能,并且在表格和图像域中分布不匹配。在一些常见的现实世界中,例如面临新型的未标记异常的模型,Spade的表现平均超过了最先进的替代方案。
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.