论文标题
多个3D人的单眼,单阶段的回归
Monocular, One-stage, Regression of Multiple 3D People
论文作者
论文摘要
本文着重于从单个RGB图像中的多个3D人员的回归。现有方法主要遵循多阶段的管道,该管道首先检测到边界框中的人,然后独立地回归其3D身体网格。相比之下,我们建议以一个阶段的方式为多个3D人(称为romp)以一个阶段的方式回归。该方法在概念上是简单的,无框的,并且能够以端到端的方式学习每个像素表示。我们的方法同时预测了身体中心的热图和网格参数图,该图可以共同描述像素水平上的3D身体网格。通过以身体中心引导的采样过程,图像中所有人的身体网格参数很容易从网格参数图中提取。我们的一个阶段框架配备了如此细粒度的表示,没有复杂的多阶段过程,并且更健壮。与最先进的方法相比,ROMP在具有挑战性的多人基准(包括3DPW和CMU Panoptic)上取得了出色的性能。在拥挤/阻塞数据集上的实验证明了在各种类型的遮挡下的鲁棒性。发布的代码是单眼多人3D网格回归的第一个实时实现。
This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.