论文标题

OPP-Miner:保留订单的顺序模式挖掘

OPP-Miner: Order-preserving sequential pattern mining

论文作者

Wu, Youxi, Hu, Qian, Li, Yan, Guo, Lei, Zhu, Xingquan, Wu, Xindong

论文摘要

时间序列是按时间顺序排列的测量集合。从时间序列中发现模式在许多领域都有用,例如库存分析,疾病检测和天气预报。为了发现模式,现有方法通常将时间序列数据转换为另一种形式,例如名义/符号格式,以降低维数,这不可避免地会偏离数据值。此外,现有方法主要忽略了时间序列值之间的顺序关系。为了解决这些问题,受订单保留匹配的启发,本文提出了一种保留订单的顺序模式(OPP)挖掘方法,该方法根据时间序列数据的顺序关系表示模式。这种表示的固有优势是,时间序列的趋势可以由时间序列数据下方的值的相对顺序表示。为了获得时间序列的频繁趋势,我们提出了OPP-Miner算法以相同趋势(具有相同相对顺序的子序列)来挖掘模式。 OPP-Miner采用过滤和验证策略来计算支持,并使用模式融合策略来产生候选模式。为了压缩结果集,我们还研究找到最大OPP。实验验证了OPP-Miner不仅有效且可扩展,而且还可以发现时间序列中的类似子序列。此外,案例研究表明,我们的算法通过鉴定关键趋势并改善聚类性能来分析COVID-19的流行方面具有很高的实用性。

A time series is a collection of measurements in chronological order. Discovering patterns from time series is useful in many domains, such as stock analysis, disease detection, and weather forecast. To discover patterns, existing methods often convert time series data into another form, such as nominal/symbolic format, to reduce dimensionality, which inevitably deviates the data values. Moreover, existing methods mainly neglect the order relationships between time series values. To tackle these issues, inspired by order-preserving matching, this paper proposes an Order-Preserving sequential Pattern (OPP) mining method, which represents patterns based on the order relationships of the time series data. An inherent advantage of such representation is that the trend of a time series can be represented by the relative order of the values underneath the time series data. To obtain frequent trends in time series, we propose the OPP-Miner algorithm to mine patterns with the same trend (sub-sequences with the same relative order). OPP-Miner employs the filtration and verification strategies to calculate the support and uses pattern fusion strategy to generate candidate patterns. To compress the result set, we also study finding the maximal OPPs. Experiments validate that OPP-Miner is not only efficient and scalable but can also discover similar sub-sequences in time series. In addition, case studies show that our algorithms have high utility in analyzing the COVID-19 epidemic by identifying critical trends and improve the clustering performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源