很少的帮助带来了很大的不同：利用积极学习来改善无监督时间序列异常检测

论文标题

很少的帮助带来了很大的不同：利用积极学习来改善无监督时间序列异常检测

Little Help Makes a Big Difference: Leveraging Active Learning to Improve Unsupervised Time Series Anomaly Detection

论文作者

Bodor, Hamza, Hoang, Thai V., Zhang, Zonghua

论文摘要

主要是时间序列数据的关键性能指标（KPI）已被广泛用于指示电信网络的性能。基于给定的KPI，已经部署了大量的异常检测算法来检测意外的网络事件。通常，由于标记KPI非常耗时和资源容易出错，因此无监督的异常检测算法比监督的算法更受欢迎。但是，那些无监督的异常检测算法通常会遭受过多的错误警报，尤其是在网络重新配置或维护的概念漂移存在下。为了应对这一挑战并改善无监督的异常检测算法的整体性能，我们建议使用主动学习来介绍和从操作员的反馈中受益，他们可以验证警报（既包括错误和真实的），并以合理的努力为相应的KPI标记相应的KPI。具体而言，我们制定了三种查询策略，以选择标记的最有用和代表性的样本。我们还开发了一种有效的方法来更新隔离林的权重并最佳调整决策阈值，以最终提高检测模型的性能。使用一个公共数据集和一个专有数据集进行的实验表明，我们积极学习的授权异常检测管道可以在F1分数方面获得性能增益，而基线算法超过50％。它还以大约6％-10％的效果胜过现有的基于活跃的学习方法，预算大大降低（要标记的样品比率）。

Key Performance Indicators (KPI), which are essentially time series data, have been widely used to indicate the performance of telecom networks. Based on the given KPIs, a large set of anomaly detection algorithms have been deployed for detecting the unexpected network incidents. Generally, unsupervised anomaly detection algorithms gain more popularity than the supervised ones, due to the fact that labeling KPIs is extremely time- and resource-consuming, and error-prone. However, those unsupervised anomaly detection algorithms often suffer from excessive false alarms, especially in the presence of concept drifts resulting from network re-configurations or maintenance. To tackle this challenge and improve the overall performance of unsupervised anomaly detection algorithms, we propose to use active learning to introduce and benefit from the feedback of operators, who can verify the alarms (both false and true ones) and label the corresponding KPIs with reasonable effort. Specifically, we develop three query strategies to select the most informative and representative samples to label. We also develop an efficient method to update the weights of Isolation Forest and optimally adjust the decision threshold, so as to eventually improve the performance of detection model. The experiments with one public dataset and one proprietary dataset demonstrate that our active learning empowered anomaly detection pipeline could achieve performance gain, in terms of F1-score, more than 50% over the baseline algorithm. It also outperforms the existing active learning based methods by approximately 6%-10%, with significantly reduced budget (the ratio of samples to be labeled).

下载PDF全文

下载文献需遵守相关版权规定

论文标题