多个3D人的单眼，单阶段的回归

论文标题

多个3D人的单眼，单阶段的回归

Monocular, One-stage, Regression of Multiple 3D People

论文作者

Sun, Yu, Bao, Qian, Liu, Wu, Fu, Yili, Black, Michael J., Mei, Tao

论文摘要

本文着重于从单个RGB图像中的多个3D人员的回归。现有方法主要遵循多阶段的管道，该管道首先检测到边界框中的人，然后独立地回归其3D身体网格。相比之下，我们建议以一个阶段的方式为多个3D人（称为romp）以一个阶段的方式回归。该方法在概念上是简单的，无框的，并且能够以端到端的方式学习每个像素表示。我们的方法同时预测了身体中心的热图和网格参数图，该图可以共同描述像素水平上的3D身体网格。通过以身体中心引导的采样过程，图像中所有人的身体网格参数很容易从网格参数图中提取。我们的一个阶段框架配备了如此细粒度的表示，没有复杂的多阶段过程，并且更健壮。与最先进的方法相比，ROMP在具有挑战性的多人基准（包括3DPW和CMU Panoptic）上取得了出色的性能。在拥挤/阻塞数据集上的实验证明了在各种类型的遮挡下的鲁棒性。发布的代码是单眼多人3D网格回归的第一个实时实现。

This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题