单眼3D多人姿势估计的相互自适应推理

论文标题

单眼3D多人姿势估计的相互自适应推理

Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation

论文作者

Zhang, Juze, Wang, Jingya, Shi, Ye, Gao, Fei, Xu, Lan, Yu, Jingyi

论文摘要

人际关系的阻塞和深度歧义使估计以摄像头坐标为挑战性的问题，估计单眼多人的3D姿势。典型的自上而下框架具有高计算冗余，并具有额外的检测阶段。相比之下，自下而上的方法的计算成本较低，因为它们受人数的影响较小。但是，大多数现有的自下而上方法将以相机为中心的3D人姿势估计视为两个无关的子任务：2.5D姿势估计和以相机为中心的深度估计。在本文中，我们提出了一个统一的模型，该模型利用了这两个子任务的相互益处。在框架内，稳健的结构化2.5D姿势估计旨在根据深度关系识别人际咬合。此外，我们开发了一种端到端几何感知的深度推理方法，该方法利用了2.5D姿势和以相机为中心的根深度的相互益处。该方法首先使用2.5D姿势和几何信息来推断向前传球中以相机为中心的根深度，然后利用根深蒂固，以进一步改善向后通过的2.5D姿势估计的表示。此外，我们设计了一种自适应融合方案，该方案利用视觉感知和身体几何形状来减轻固有的深度歧义问题。广泛的实验证明了我们所提出的模型比广泛的自下而上方法的优越性。我们的准确性甚至与自上而下的同行竞争。值得注意的是，我们的模型比现有的自下而上和自上而下的方法快得多。

Inter-person occlusion and depth ambiguity make estimating the 3D poses of monocular multiple persons as camera-centric coordinates a challenging problem. Typical top-down frameworks suffer from high computational redundancy with an additional detection stage. By contrast, the bottom-up methods enjoy low computational costs as they are less affected by the number of humans. However, most existing bottom-up methods treat camera-centric 3D human pose estimation as two unrelated subtasks: 2.5D pose estimation and camera-centric depth estimation. In this paper, we propose a unified model that leverages the mutual benefits of both these subtasks. Within the framework, a robust structured 2.5D pose estimation is designed to recognize inter-person occlusion based on depth relationships. Additionally, we develop an end-to-end geometry-aware depth reasoning method that exploits the mutual benefits of both 2.5D pose and camera-centric root depths. This method first uses 2.5D pose and geometry information to infer camera-centric root depths in a forward pass, and then exploits the root depths to further improve representation learning of 2.5D pose estimation in a backward pass. Further, we designed an adaptive fusion scheme that leverages both visual perception and body geometry to alleviate inherent depth ambiguity issues. Extensive experiments demonstrate the superiority of our proposed model over a wide range of bottom-up methods. Our accuracy is even competitive with top-down counterparts. Notably, our model runs much faster than existing bottom-up and top-down methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题