由演员识别的时空动作检测 - 检测谁在视频中做什么

论文标题

由演员识别的时空动作检测 - 检测谁在视频中做什么

Actor-identified Spatiotemporal Action Detection -- Detecting Who Is Doing What in Videos

论文作者

Yang, Fan, Ukita, Norimichi, Sakti, Sakriani, Nakamura, Satoshi

论文摘要

深度学习视频动作识别（AR）的成功促使研究人员逐步将相关任务从粗糙级别促进到细粒度水平。与仅预测整个视频的动作标签的常规AR相比，已经研究了时间动作检测（TAD），以估算视频中每个动作的开始和结束时间。将TAD进一步迈进，已经研究了时空动作检测（SAD），以在视频中在空间和时间上定位该动作。但是，执行动作的人通常在SAD中被忽略，同时确定演员也很重要。为此，我们提出了一项新的任务，即演员识别的时空作用检测（ASAD），以弥合SAD和Actor识别之间的差距。在ASAD中，我们不仅检测到时空边界的实例级动作，还为每个参与者分配了唯一的ID。要接近ASAD，多个对象跟踪（MOT）和动作分类（AC）是两个基本要素。通过使用MOT，获得了每个参与者的时空边界，并分配给独特的演员身份。通过使用AC，在相应的时空边界内估计了动作类别。由于ASAD是一项新任务，因此它提出了许多新挑战，这些挑战无法通过现有方法解决：i）没有专门为ASAD创建数据集，ii）ii）没有为ASAD设计的评估指标，iii）当前的MOT性能是获得令人满意的ASAD结果的瓶颈。为了解决这些问题，我们为i）注释一个新的ASAD数据集，ii）提出ASAD评估指标，通过考虑多标签行动和参与者的识别，iii）提高MOT的数据关联策略以提高MOT性能，从而提高ASAD结果。该代码可在https://github.com/fandulu/asad上找到。

The success of deep learning on video Action Recognition (AR) has motivated researchers to progressively promote related tasks from the coarse level to the fine-grained level. Compared with conventional AR which only predicts an action label for the entire video, Temporal Action Detection (TAD) has been investigated for estimating the start and end time for each action in videos. Taking TAD a step further, Spatiotemporal Action Detection (SAD) has been studied for localizing the action both spatially and temporally in videos. However, who performs the action, is generally ignored in SAD, while identifying the actor could also be important. To this end, we propose a novel task, Actor-identified Spatiotemporal Action Detection (ASAD), to bridge the gap between SAD and actor identification. In ASAD, we not only detect the spatiotemporal boundary for instance-level action but also assign the unique ID to each actor. To approach ASAD, Multiple Object Tracking (MOT) and Action Classification (AC) are two fundamental elements. By using MOT, the spatiotemporal boundary of each actor is obtained and assigned to a unique actor identity. By using AC, the action class is estimated within the corresponding spatiotemporal boundary. Since ASAD is a new task, it poses many new challenges that cannot be addressed by existing methods: i) no dataset is specifically created for ASAD, ii) no evaluation metrics are designed for ASAD, iii) current MOT performance is the bottleneck to obtain satisfactory ASAD results. To address those problems, we contribute to i) annotate a new ASAD dataset, ii) propose ASAD evaluation metrics by considering multi-label actions and actor identification, iii) improve the data association strategies in MOT to boost the MOT performance, which leads to better ASAD results. The code is available at https://github.com/fandulu/ASAD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题