夏季：可变的时间Seriessummariza

论文标题

夏季：可变的时间Seriessummariza

SummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis

论文作者

Amaral, Kevin M., Li, Zihan, Ding, Wei, Crouter, Scott, Chen, Ping

论文摘要

\ textit {夏季}试图汇总全球时间序列信号，并提供了可变长度时间序列的固定长度，稳健的摘要。许多用于分类和回归的经典机器学习方法取决于具有固定数量功能的数据实例。结果，这些方法不能直接应用于可变长度时间序列数据。一种常见的方法是通过以某种方式通过以某种方式在时间序列的本地部分进行分类，并通过对分类或平均回归进行分类。这种方法的缺点是，在投票过程中丢失了少数族裔本地信息，并且平均假设每个时间序列的测量都相等。同样，由于时间序列的长度可能会有所不同，因此在有近距离投票的领带或回归域的双峰分布的情况下，投票和平均值的质量可能会有所不同。 \ textIt {夏季}方法进行的摘要将是一个固定的特征向量向量，可以在时间序列数据集的就位使用，以与经典的机器学习方法一起使用。我们在时间序列中使用高斯混合模型（GMM），将本地数据分组为群集。每个集群的时间序列的成员资格率将是摘要中的一个功能。该模型自然能够收敛到适当的集群数。我们将我们的结果与体育活动分类的最新研究进行了比较，并仅通过摘要分类来显示出高质量的改进。最后，我们表明，使用摘要的回归可以增加能量消耗估计，从而产生更强大和更精确的结果。

\textit{SummerTime} seeks to summarize globally time series signals and provides a fixed-length, robust summarization of the variable-length time series. Many classical machine learning methods for classification and regression depend on data instances with a fixed number of features. As a result, those methods cannot be directly applied to variable-length time series data. One common approach is to perform classification over a sliding window on the data and aggregate the decisions made at local sections of the time series in some way, through majority voting for classification or averaging for regression. The downside to this approach is that minority local information is lost in the voting process and averaging assumes that each time series measurement is equal in significance. Also, since time series can be of varying length, the quality of votes and averages could vary greatly in cases where there is a close voting tie or bimodal distribution of regression domain. Summarization conducted by the \textit{SummerTime} method will be a fixed-length feature vector which can be used in-place of the time series dataset for use with classical machine learning methods. We use Gaussian Mixture models (GMM) over small same-length disjoint windows in the time series to group local data into clusters. The time series' rate of membership for each cluster will be a feature in the summarization. The model is naturally capable of converging to an appropriate cluster count. We compare our results to state-of-the-art studies in physical activity classification and show high-quality improvement by classifying with only the summarization. Finally, we show that regression using the summarization can augment energy expenditure estimation, producing more robust and precise results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题