一个背景范围的框架，并具有对抗性培训，用于视频中异常事件检测

论文标题

一个背景范围的框架，并具有对抗性培训，用于视频中异常事件检测

A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

论文作者

Georgescu, Mariana-Iuliana, Ionescu, Radu Tudor, Khan, Fahad Shahbaz, Popescu, Marius, Shah, Mubarak

论文摘要

视频中的异常事件检测是一个复杂的计算机视觉问题，近年来引起了极大的关注。任务的复杂性来自异常事件的常用定义，即很少发生的事件，通常取决于周围的上下文。按照异常事件检测为异常检测的标准表述，我们提出了一个背景 - 不合时宜的框架，该框架从仅包含正常事件的培训视频中学习。我们的框架由对象检测器，一组外观和运动自动编码器以及一组分类器组成。由于我们的框架仅查看对象检测，因此可以将其应用于不同的场景，前提是正常事件在场景之间相同定义，并且变化的单个主要因素是背景。为了克服培训期间缺乏异常数据，我们为自动编码器提出了一种对抗性学习策略。我们创建了一组不域外的伪内示例，该示例是由自动编码器正确重建的，然后在对伪abnormal示例上应用梯度上升之前。当训练基于外观和基于运动的二进制分类器以区分正常和异常的潜在特征和重建时，我们进一步利用伪碱性示例作为异常示例。我们使用各种评估指标将我们的框架与四个基准数据集的最新方法进行了比较。与现有方法相比，经验结果表明，我们的方法在所有数据集上都能达到有利的性能。此外，我们为来自文献的两个大规模异常事件检测数据集提供了基于区域和轨道的注释，即上海和地铁。

Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a set of classifiers. Since our framework only looks at object detections, it can be applied to different scenes, provided that normal events are defined identically across scenes and that the single main factor of variation is the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain pseudo-abnormal examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the pseudo-abnormal examples. We further utilize the pseudo-abnormal examples to serve as abnormal examples when training appearance-based and motion-based binary classifiers to discriminate between normal and abnormal latent features and reconstructions. We compare our framework with the state-of-the-art methods on four benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets. In addition, we provide region-based and track-based annotations for two large-scale abnormal event detection data sets from the literature, namely ShanghaiTech and Subway.

下载PDF全文

下载文献需遵守相关版权规定

论文标题