论文标题

狙击手:用于同时多人3D姿势估计跟踪和预测视频片段上的时空变压器

Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

论文作者

Zou, Shihao, Xu, Yuanlu, Li, Chao, Ma, Lingni, Cheng, Li, Vo, Minh

论文摘要

来自RGB视频的多人姿势理解涉及三个复杂的任务:姿势估计,跟踪和运动预测。直觉上,准确的多人姿势估计有助于稳健的跟踪,稳健的跟踪构建了至关重要的历史记录,以进行正确的运动预测。大多数现有的作品要么着重于单个任务,要么采用多阶段方法分别解决多个任务,这往往在每个阶段做出次优决策,并且也无法利用这三个任务之间的相关性。在本文中,我们提出了Snipper,这是一个统一的框架,用于在一个阶段同时进行多人3D姿势估计,跟踪和运动预测。我们提出了一种有效但强大的可变形注意机制,以从视频片段中汇总时空信息。在这种可变形的关注的基础上,学会了视频变压器从多框架片段中编码时空特征,并为多人姿势查询解码信息的姿势功能。最后,这些姿势查询被回归,以预测单镜头中的多人姿势轨迹和未来动作。在实验中,我们显示了狙击手对三个挑战性的公共数据集的有效性,在该数据集中,我们的通用模型竞争对手专门针对姿势估计,跟踪和预测的最先进基线。

Multi-person pose understanding from RGB videos involves three complex tasks: pose estimation, tracking and motion forecasting. Intuitively, accurate multi-person pose estimation facilitates robust tracking, and robust tracking builds crucial history for correct motion forecasting. Most existing works either focus on a single task or employ multi-stage approaches to solving multiple tasks separately, which tends to make sub-optimal decision at each stage and also fail to exploit correlations among the three tasks. In this paper, we propose Snipper, a unified framework to perform multi-person 3D pose estimation, tracking, and motion forecasting simultaneously in a single stage. We propose an efficient yet powerful deformable attention mechanism to aggregate spatiotemporal information from the video snippet. Building upon this deformable attention, a video transformer is learned to encode the spatiotemporal features from the multi-frame snippet and to decode informative pose features for multi-person pose queries. Finally, these pose queries are regressed to predict multi-person pose trajectories and future motions in a single shot. In the experiments, we show the effectiveness of Snipper on three challenging public datasets where our generic model rivals specialized state-of-art baselines for pose estimation, tracking, and forecasting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源