SCT：设置受约束的时间变压器，用于设置监督的动作分段

论文标题

SCT：设置受约束的时间变压器，用于设置监督的动作分段

SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation

论文作者

Fayyaz, Mohsen, Gall, Juergen

论文摘要

时间动作细分是一个越来越兴趣的话题，但是，对视频中的每个框架的注释都很繁琐且昂贵。因此，弱监督的方法旨在从仅标记弱标记的视频中学习时间动作细分。在这项工作中，我们假设对于每个培训视频，仅给出了视频中出现的动作列表，而不是在何时，何时以及以什么顺序发生。为了解决此任务，我们提出了一种可以在此类数据上端到端训练的方法。该方法将视频划分为较小的时间区域，并预测每个区域的动作标签及其长度。此外，网络估计每个帧的动作标签。通过衡量框架对时间区域和带注释的动作标签的一致性的一致性，该网络学会了将视频分为班级矛盾的区域。我们在三个数据集上评估了我们的方法，该数据集可以在其中实现最先进的结果。

Temporal action segmentation is a topic of increasing interest, however, annotating each frame in a video is cumbersome and costly. Weakly supervised approaches therefore aim at learning temporal action segmentation from videos that are only weakly labeled. In this work, we assume that for each training video only the list of actions is given that occur in the video, but not when, how often, and in which order they occur. In order to address this task, we propose an approach that can be trained end-to-end on such data. The approach divides the video into smaller temporal regions and predicts for each region the action label and its length. In addition, the network estimates the action labels for each frame. By measuring how consistent the frame-wise predictions are with respect to the temporal regions and the annotated action labels, the network learns to divide a video into class-consistent regions. We evaluate our approach on three datasets where the approach achieves state-of-the-art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题