主题 - 时间序列中简单准确检测

论文标题

主题 - 时间序列中简单准确检测

Motiflets -- Simple and Accurate Detection of Motifs in Time Series

论文作者

Schäfer, Patrick, Leser, Ulf

论文摘要

时间序列图案是一个短时间序列，在较大的时间序列中重复自身大致相同。这样的图案通常代表隐藏的结构，例如心电图录音中的心跳，流行歌曲中的即兴演奏或脑电图中的睡眠纺锤。主题发现（MD）是在给定输入系列中找到此类主题的任务。由于有不同的定义，因此存在许多不同的算法。作为中心参数，它们都采用了基序的长度L和图案发生之间的最大距离r。但是，在实践中，尤其是针对R的合适值很难确定前期，并且发现基序甚至对于非常相似的R值也显示出很高的可变性。因此，找到一个有趣的主题需要广泛的反复试验。在本文中，我们对MD问题提出了另一种方法。我们将k-motiflet定义为长度为l的基序的精确k出现，其最大成对距离是最小的。这将MD问题上下倒置：我们方法的中心参数不是距离阈值r，而是该基序的所需出现数量k，我们显示的是更直观且易于设置。基于此定义，我们提出了用于查找K-单体并分析其复杂性的精确和近似算法。为了进一步简化我们的方法的使用，我们描述了统计工具以自动确定其输入参数的有意义值。通过评估几个现实世界数据集并将其与四种SOTA MD算法进行比较，我们表明我们提出的算法在数量上既优于其竞争对手，又在较高的相似性上找到较大的主题集，并且在质量上更高，并且在无需进行手动调用的情况下更清晰，更易于解释主题。

A time series motif intuitively is a short time series that repeats itself approximately the same within a larger time series. Such motifs often represent concealed structures, such as heart beats in an ECG recording, the riff in a pop song, or sleep spindles in EEG sleep data. Motif discovery (MD) is the task of finding such motifs in a given input series. As there are varying definitions of what exactly a motif is, a number of different algorithms exist. As central parameters they all take the length l of the motif and the maximal distance r between the motif's occurrences. In practice, however, especially suitable values for r are very hard to determine upfront, and found motifs show a high variability even for very similar r values. Accordingly, finding an interesting motif requires extensive trial-and-error. In this paper, we present a different approach to the MD problem. We define k-Motiflets as the set of exactly k occurrences of a motif of length l, whose maximum pairwise distance is minimal. This turns the MD problem upside-down: The central parameter of our approach is not the distance threshold r, but the desired number of occurrence k of the motif, which we show is considerably more intuitive and easier to set. Based on this definition, we present exact and approximate algorithms for finding k-Motiflets and analyze their complexity. To further ease the use of our method, we describe statistical tools to automatically determine meaningful values for its input parameters. By evaluation on several real-world data sets and comparison to four SotA MD algorithms, we show that our proposed algorithm is both quantitatively superior to its competitors, finding larger motif sets at higher similarity, and qualitatively better, leading to clearer and easier to interpret motifs without any need for manual tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题