论文标题
符号表示时间数据的有效聚合方法
An efficient aggregation method for the symbolic representation of temporal data
论文作者
论文摘要
符号表示是缩小时间数据的尺寸的有用工具,可以从时间序列中有效存储和信息检索。他们还可以通过降低降噪和对超参数的敏感性来增强时间序列数据的机器学习算法的训练。自适应布朗桥的聚合(ABBA)方法是一种这样有效且稳健的符号表示,被证明可以准确捕获时间序列中的重要趋势和形状。但是,以目前的形式,方法难以处理非常大的时间序列。在这里,我们提出了一种新的ABBA方法,称为Fabba。该变体利用了针对时间序列的分段表示定制的新聚合方法。通过用基于排序的聚合技术替换ABBA中使用的K-均值聚类,从而避免了重复的方形错误计算,则计算复杂性大大降低。与原始方法相反,新方法不需要预先指定时间序列符号的数量。通过广泛的测试,我们证明,新方法的运行时间大大降低,同时在重建准确性方面也胜过流行的SAX和1D-SAX表示。我们进一步证明FABBA可以压缩其他数据类型,例如图像。
Symbolic representations are a useful tool for the dimension reduction of temporal data, allowing for the efficient storage of and information retrieval from time series. They can also enhance the training of machine learning algorithms on time series data through noise reduction and reduced sensitivity to hyperparameters. The adaptive Brownian bridge-based aggregation (ABBA) method is one such effective and robust symbolic representation, demonstrated to accurately capture important trends and shapes in time series. However, in its current form the method struggles to process very large time series. Here we present a new variant of the ABBA method, called fABBA. This variant utilizes a new aggregation approach tailored to the piecewise representation of time series. By replacing the k-means clustering used in ABBA with a sorting-based aggregation technique, and thereby avoiding repeated sum-of-squares error computations, the computational complexity is significantly reduced. In contrast to the original method, the new approach does not require the number of time series symbols to be specified in advance. Through extensive tests we demonstrate that the new method significantly outperforms ABBA with a considerable reduction in runtime while also outperforming the popular SAX and 1d-SAX representations in terms of reconstruction accuracy. We further demonstrate that fABBA can compress other data types such as images.