一个新颖的在线操作检测框架，来自未修剪的视频流

论文标题

一个新颖的在线操作检测框架，来自未修剪的视频流

A Novel Online Action Detection Framework from Untrimmed Video Streams

论文作者

Yoon, Da-Hye, Cho, Nam-Gyu, Lee, Seong-Whan

论文摘要

在未修剪的视频流中的在线时间动作本地化是计算机视觉中的一个具有挑战性的问题。这是具有挑战性的，因为i）在未修剪的视频流中，可能会出现多个动作实例，包括背景场景，ii）在在线设置中，只有过去和当前信息可用。因此，由于人类行为的阶段差异很大，因此诸如先前的动作检测方法被利用的训练数据的平均动作持续时间（例如训练数据的平均动作持续时间）不适合此任务。我们提出了一个新颖的在线操作检测框架，该框架将操作视为一组暂时订购的子类，并利用未来的框架生成网络来应对与上面概述的问题相关的有限信息问题。此外，我们通过改变视频的长度来增强数据，以允许提出的方法了解人类行为的高层内差异。我们使用两个基准数据集Thumos'14和ActivityNet评估了我们的方法，以进行在线时间动作本地化方案，并证明该性能与已为离线设置提出的最新方法相媲美。

Online temporal action localization from an untrimmed video stream is a challenging problem in computer vision. It is challenging because of i) in an untrimmed video stream, more than one action instance may appear, including background scenes, and ii) in online settings, only past and current information is available. Therefore, temporal priors, such as the average action duration of training data, which have been exploited by previous action detection methods, are not suitable for this task because of the high intra-class variation in human actions. We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses and leverages a future frame generation network to cope with the limited information issue associated with the problem outlined above. Additionally, we augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions. We evaluate our method using two benchmark datasets, THUMOS'14 and ActivityNet, for an online temporal action localization scenario and demonstrate that the performance is comparable to state-of-the-art methods that have been proposed for offline settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题