论文标题

一种用于优化实时异常检测的新型算法

A Novel Algorithm for Optimized Real Time Anomaly Detection in Timeseries

论文作者

Kapoor, Krishnam

论文摘要

与其相邻点有显着不同但不能归类为噪声的数据的观察结果被称为异常或异常值。这些异常是引起关注的原因,并且及时警告其存在可能是有价值的。在本文中,我们评估并比较了从机器学习和统计数据域中检测离线数据和实时数据异常的流行算法的性能。我们的目的是提出一种算法,该算法可以有效地处理所有类型的季节性和非季节数据,并且足够快,可以实时实用。在全球范围内检测异常不仅很重要,而且由于其本地环境而是异常的异常。这种离群值可以称为上下文异常,因为它们从相邻的观察结果中得出上下文。另外,我们需要一种方法来自动确定给定数据中季节性的存在。为了检测季节性,提出的算法采用曲线拟合方法,而不是基于模型的异常检测。提出的模型还引入了一个独特的过滤器,该过滤器评估了局部异常值的相对重要性,并消除了被认为无关紧要的滤波器。由于所提出的模型将多项式符合时间表数据的多项式,因此与Arima,Sarima和Winter Holt等统计替代方案相比,它不会遭受异性疾病和突破之类的问题。实验结果该提出的算法在实时以及人工生成的数据集上的性能更好。

Observations in data which are significantly different from its neighbouring points but cannot be classified as noise are known as anomalies or outliers. These anomalies are a cause of concern and a timely warning about their presence could be valuable. In this paper, we have evaluated and compared the performance of popular algorithms from domains of Machine Learning and Statistics in detecting anomalies on both offline data as well as real time data. Our aim is to come up with an algorithm which can handle all types of seasonal and non-seasonal data effectively and is fast enough to be of practical utility in real time. It is not only important to detect anomalies at the global but also the ones which are anomalies owing to their local surroundings. Such outliers can be termed as contextual anomalies as they derive their context from the neighbouring observations. Also, we require a methodology to automatically determine the presence of seasonality in the given data. For detecting the seasonality, the proposed algorithm takes up a curve fitting approach rather than model based anomaly detection. The proposed model also introduces a unique filter which assess the relative significance of local outliers and removes the ones deemed as insignificant. Since, the proposed model fits polynomial in buckets of timeseries data, it does not suffer from problems such as heteroskedasticity and breakout as compared to its statistical alternatives such as ARIMA, SARIMA and Winter Holt. Experimental results the proposed algorithm performs better on both real time as well as artificial generated datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源