Gabriella：未经修剪安全视频中实时活动检测的在线系统

论文标题

Gabriella：未经修剪安全视频中实时活动检测的在线系统

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

论文作者

Rizve, Mamshad Nayeem, Demir, Ugur, Tirupattur, Praveen, Rana, Aayush Jung, Duarte, Kevin, Dave, Ishan, Rawat, Yogesh Singh, Shah, Mubarak

论文摘要

安全视频中的活动检测是一个困难的问题，这是由于多种因素，例如较大的视野，多个活动的存在，变化的量表和观点及其未修剪的性质。现有的活动检测研究主要集中在数据集上，例如UCF-101，JHMDB，Thumos和AVA，部分解决了这些问题。实时处理安全视频的要求使这更具挑战性。在这项工作中，我们提出了Gabriella，这是一种实时在线系统，可在未修剪的安全视频上执行活动检测。所提出的方法包括三个阶段：小管萃取，活动分类和在线管骨骼合并。对于小管萃取，我们提出了一个定位网络，该网络将视频剪辑作为输入和时空剪辑检测到多个尺度上的潜在前景区域以生成动作小管。我们提出了一种新颖的补丁折扣，以处理演员大小的巨大变化。我们在剪辑级别上对视频的在线处理大大减少了检测活动的计算时间。检测到的小管是通过分类网络分配的活动类别得分，并使用我们建议的小管 - 合并动作分裂（TMA）算法合并在一起，以形成最终的动作检测。 TMA算法有效地以在线方式连接了小管，以生成可抵抗不同长度活动的动作检测。我们在Virat和Meva（具有活动的多视频扩展视频）上执行实验，并以速度（〜100 fps）和最先进的结果的性能证明了所提出方法的有效性。代码和模型将公开可用。

Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security videos in real-time makes this even more challenging. In this work we propose Gabriella, a real-time online system to perform activity detection on untrimmed security videos. The proposed method consists of three stages: tubelet extraction, activity classification, and online tubelet merging. For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets. We propose a novel Patch-Dice loss to handle large variations in actor size. Our online processing of videos at a clip level drastically reduces the computation time in detecting activities. The detected tubelets are assigned activity class scores by the classification network and merged together using our proposed Tubelet-Merge Action-Split (TMAS) algorithm to form the final action detections. The TMAS algorithm efficiently connects the tubelets in an online fashion to generate action detections which are robust against varying length activities. We perform our experiments on the VIRAT and MEVA (Multiview Extended Video with Activities) datasets and demonstrate the effectiveness of the proposed approach in terms of speed (~100 fps) and performance with state-of-the-art results. The code and models will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题