通过数据增强提高图形异常检测模型的普遍性

论文标题

通过数据增强提高图形异常检测模型的普遍性

Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation

论文作者

Zhou, Shuang, Huang, Xiao, Liu, Ninghao, Chung, Fu-Lai, Huang, Long-Kai

论文摘要

图异常检测（GAD）是一项重要任务，因为即使有一些异常也可能对良性用户构成巨大威胁。最近可以有效利用可用标签作为先验知识的半监督GAD方法比无监督的方法获得了卓越的性能。实际上，人们通常需要在新（子）图上识别异常以确保其业务，但是他们可能缺乏培训有效检测模型的标签。一个自然的想法是将经过训练的GAD模型直接在新的（子）图中进行测试。但是，我们发现，现有的半监督GAD方法遭受泛化问题的困扰，即训练有素的模型无法在同一图的看不见的区域（即无法在培训中无法访问）上表现良好。这可能会造成巨大的麻烦。在本文中，我们以这种现象为基础，并提出了广义图异常检测的一般研究问题，旨在有效地识别训练域图和看不见的测试图，以消除潜在的危险。然而，这是一项具有挑战性的任务，因为只有有限的标签可用，并且培训数据和测试数据之间的正常背景可能会有所不同。因此，我们提出了一个名为\ textit {augan}（\ uline {Augan}的数据增强方法，用于\ uline {a} nomaly和\ uline {n} ormal分布），以丰富培训数据并促进GAD模型的普遍性。实验验证了我们方法在改善模型推广性方面的有效性。

Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题