论文标题
在视频中检测到参加的视觉目标
Detecting Attended Visual Targets in Video
论文作者
论文摘要
我们解决了检测视频中注意目标的问题。我们的目标是确定视频的每个框架中的每个人在哪里,并正确处理目光目标范围内的情况。我们的新颖体系结构对场景和头部特征之间的动态相互作用进行了建模,并渗透到了随时间变化的注意力目标。我们介绍了一个新的注释数据集,VideoAttentionTarget,其中包含现实世界中的复杂而动态的模式。我们的实验表明,我们的模型可以有效地推断视频中的动态关注。此外,我们将预测的注意图应用于两个社会目光识别任务,并表明所产生的分类器显着超过现有方法。我们在三个数据集上实现了最先进的性能:凉亭(静态图像),VideoattentionTarget(视频)和VideoCoatt(视频),并获得了自动在没有可穿戴相机或痕迹器的情况下自动对临床上与临床相关的凝视行为进行分类的第一个结果。
We address the problem of detecting attention targets in video. Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. Our novel architecture models the dynamic interaction between the scene and head features and infers time-varying attention targets. We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior. Our experiments show that our model can effectively infer dynamic attention in videos. In addition, we apply our predicted attention maps to two social gaze behavior recognition tasks, and show that the resulting classifiers significantly outperform existing methods. We achieve state-of-the-art performance on three datasets: GazeFollow (static images), VideoAttentionTarget (videos), and VideoCoAtt (videos), and obtain the first results for automatically classifying clinically-relevant gaze behavior without wearable cameras or eye trackers.