当前的时间序列异常检测基准是有缺陷的，并且正在产生进步的幻想

论文标题

当前的时间序列异常检测基准是有缺陷的，并且正在产生进步的幻想

Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress

论文作者

Wu, Renjie, Keogh, Eamonn J.

论文摘要

时间序列的异常检测一直是数据科学中常年重要的主题，论文可以追溯到1950年代。但是，近年来，对这个主题引起了人们的兴趣，其中很大程度上是由于深度学习在其他领域和其他时间序列任务中的成功驱动。这些论文中的大多数对Yahoo，Numenta，NASA等创建的一个或多个流行的基准数据集进行了测试。在这项工作中，我们提出了令人惊讶的主张。这些数据集中的大多数示例都遭受四个缺陷中的一个或多个。由于存在这四个缺陷，我们认为许多发表的异常检测算法的比较可能是不可靠的，更重要的是，近年来，许多明显的进步可能都是幻想。除了证明这些主张外，我们还介绍了UCR时间序列异常存档。我们认为，该资源将通过为社区提供基准，从而可以在方法和有意义的整体进步量学之间进行有意义的比较，从而扮演与UCR时间序列分类档案相似的角色。

Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Archive. We believe that this resource will perform a similar role as the UCR Time Series Classification Archive, by providing the community with a benchmark that allows meaningful comparisons between approaches and a meaningful gauge of overall progress.

下载PDF全文

下载文献需遵守相关版权规定

论文标题