论文标题

随着时间的切片合成少数族裔超采样技术,归纳缺失的观察结果

Imputing Missing Observations with Time Sliced Synthetic Minority Oversampling Technique

论文作者

Baumgartner, Andrew, Molani, Sevda, Wei, Qi, Hadlock, Jennifer

论文摘要

我们提出了一种简单而新颖的时间序列插补技术,其目的是构建一个不规则的时间序列,该时间序列在数据集中的每个样本中都是均匀的。具体而言,我们修复了一个网格,该网格由观察时间的非重叠垃圾箱(称为“切片”)的中点定义,并确保每个样品在给定时间具有所有特征的值。这使得人们既可以完全缺少观察值,以允许在整个数据中进行统一的时间序列分类,并且在特殊情况下,可以估算单独丢失的功能。为此,我们稍微概括了众所周知的类不平衡算法smote \ cite {smote},以允许组件明智的邻居插值,在没有丢失的功能时保持相关性。我们在二维未偶联的谐波振荡器的简化设置中可视化该方法。接下来,我们使用TSMOTE训练具有逻辑回归的编码器/解码器长期记忆(LSTM)模型,以预测和分类不同2D振荡器的不同轨迹。在说明TSMOTE在这种情况下的实用性之后,我们使用相同的体系结构在估算的数据集中训练COVID-19疾病严重程度的临床模型。我们的实验表明,通过允许模型识别更广泛的患者轨迹以及对汇总分类模型的改进,表明了对标准均值和中位归合技术的改进。

We present a simple yet novel time series imputation technique with the goal of constructing an irregular time series that is uniform across every sample in a data set. Specifically, we fix a grid defined by the midpoints of non-overlapping bins (dubbed "slices") of observation times and ensure that each sample has values for all of the features at that given time. This allows one to both impute fully missing observations to allow uniform time series classification across the entire data and, in special cases, to impute individually missing features. To do so, we slightly generalize the well-known class imbalance algorithm SMOTE \cite{smote} to allow component wise nearest neighbor interpolation that preserves correlations when there are no missing features. We visualize the method in the simplified setting of 2-dimensional uncoupled harmonic oscillators. Next, we use tSMOTE to train an Encoder/Decoder long-short term memory (LSTM) model with Logistic Regression for predicting and classifying distinct trajectories of different 2D oscillators. After illustrating the the utility of tSMOTE in this context, we use the same architecture to train a clinical model for COVID-19 disease severity on an imputed data set. Our experiments show an improvement over standard mean and median imputation techniques by allowing a wider class of patient trajectories to be recognized by the model, as well as improvement over aggregated classification models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源