少数族裔过度采样不平衡时间序列分类

论文标题

少数族裔过度采样不平衡时间序列分类

Minority Oversampling for Imbalanced Time Series Classification

论文作者

Zhu, Tuanfei, Luo, Cheng, Li, Jing, Ren, Siqi, Zhang, Zhihong

论文摘要

许多重要的实际应用程序涉及具有偏斜分布的时间序列数据。与传统的不平衡学习问题相比，由于高维度和高可变化的相关性，不平衡时间序列数据的分类更具挑战性。本文提出了一种保留过采样方法的结构，以打击高维不平衡的时间序列分类（OHIT）。 OHIT首先利用基于密度比率的共享最近的邻居聚类算法来捕获高维空间中的少数类别模式。然后，对于每种模式，它都应用了大维协方差矩阵的收缩技术，以获得准确可靠的协方差结构。最后，OHIT通过使用估计的协方差矩阵基于多元高斯分布生成基于多元高斯分布的结构合成样品。关于几个公开可用的时间序列数据集（包括单峰和多模式）的实验结果证明了OHIT与F1，G-Mean和AUC方面相对于最先进的过度采样算法的优势。

Many important real-world applications involve time-series data with skewed distribution. Compared to conventional imbalance learning problems, the classification of imbalanced time-series data is more challenging due to high dimensionality and high inter-variable correlation. This paper proposes a structure preserving Oversampling method to combat the High-dimensional Imbalanced Time-series classification (OHIT). OHIT first leverages a density-ratio based shared nearest neighbor clustering algorithm to capture the modes of minority class in high-dimensional space. It then for each mode applies the shrinkage technique of large-dimensional covariance matrix to obtain accurate and reliable covariance structure. Finally, OHIT generates the structure-preserving synthetic samples based on multivariate Gaussian distribution by using the estimated covariance matrices. Experimental results on several publicly available time-series datasets (including unimodal and multimodal) demonstrate the superiority of OHIT against the state-of-the-art oversampling algorithms in terms of F1, G-mean, and AUC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题