视频中多人姿势估计和跟踪的自我监督关键点对应

论文标题

视频中多人姿势估计和跟踪的自我监督关键点对应

Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos

论文作者

Rafi, Umer, Doering, Andreas, Leibe, Bastian, Gall, Juergen

论文摘要

视频注释昂贵且耗时。因此，与大规模图像数据集相比，用于人姿势估计的大规模图像数据集的数据集多样性较小，并具有更稀疏的注释。这使得学习基于深度学习的模型，以将跨帧的关键点关联到诸如运动模糊和遮挡诸如多人姿势跟踪任务之类的滋扰因素的框架上。为了解决这个问题，我们提出了一种依赖视频中关联人员关键点对应的方法。它没有在视频数据上训练网络来估算关键点对应关系，而是在大型图像数据集上训练，以使用自我判断进行人体姿势估算。结合一个自上而下的姿势估计框架，我们使用与（i）恢复错过的姿势检测（II）跨视频框架的姿势检测的密钥点对应。我们的方法可在Postrack $ 2017 $和POSetrack $ 2018 $数据集上获得多帧姿势估计和多人姿势跟踪的最新结果。

Video annotation is expensive and time consuming. Consequently, datasets for multi-person pose estimation and tracking are less diverse and have more sparse annotations compared to large scale image datasets for human pose estimation. This makes it challenging to learn deep learning based models for associating keypoints across frames that are robust to nuisance factors such as motion blur and occlusions for the task of multi-person pose tracking. To address this issue, we propose an approach that relies on keypoint correspondences for associating persons in videos. Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image datasets for human pose estimation using self-supervision. Combined with a top-down framework for human pose estimation, we use keypoints correspondences to (i) recover missed pose detections (ii) associate pose detections across video frames. Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PosTrack $2017$ and PoseTrack $2018$ data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题