论文标题

标记为最佳分区

Labeled Optimal Partitioning

论文作者

Hocking, Toby Dylan, Srivastava, Anuraag

论文摘要

在在空间或时间上测量的数据序列中,一个重要的问题是准确检测突然变化。在部分标记的数据中,重要的是要正确预测火车和测试集的正/负标记区域的存在/不存在变化。一种现有的动态编程算法设计用于未标记的测试区域(并且忽略火车集中的标签);另一个是用于精确拟合火车标签(但不能预测未标记的测试区域中的更改点)。我们通过提出一个新的最佳更改点检测模型来解决这些问题,该模型可以保证适合火车数据中的标签,并且还可以预测测试数据中未标记的更改点。我们提出了一种新的动态编程算法,标记为“最佳分区”(Lopart),并提供了正式的证据,证明它可以解决所得的非凸优化问题。我们提供了算法的时间复杂性的理论和经验分析,就标签数量和分段数据序列的大小而言。最后,我们提供的经验证据表明,就火车和测试标签误差而言,我们的算法比现有基线更准确。

In data sequences measured over space or time, an important problem is accurate detection of abrupt changes. In partially labeled data, it is important to correctly predict presence/absence of changes in positive/negative labeled regions, in both the train and test sets. One existing dynamic programming algorithm is designed for prediction in unlabeled test regions (and ignores the labels in the train set); another is for accurate fitting of train labels (but does not predict changepoints in unlabeled test regions). We resolve these issues by proposing a new optimal changepoint detection model that is guaranteed to fit the labels in the train data, and can also provide predictions of unlabeled changepoints in test data. We propose a new dynamic programming algorithm, Labeled Optimal Partitioning (LOPART), and we provide a formal proof that it solves the resulting non-convex optimization problem. We provide theoretical and empirical analysis of the time complexity of our algorithm, in terms of the number of labels and the size of the data sequence to segment. Finally, we provide empirical evidence that our algorithm is more accurate than the existing baselines, in terms of train and test label error.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源