Mebow：野生身体方向的单眼估计

论文标题

Mebow：野生身体方向的单眼估计

MEBOW: Monocular Estimation of Body Orientation In the Wild

论文作者

Wu, Chenyan, Chen, Yukun, Luo, Jiajia, Su, Che-Chun, Dawane, Anuja, Hanzra, Bikramjot, Deng, Zhuo, Liu, Bilan, Wang, James, Kuo, Cheng-Hao

论文摘要

身体方向估计在许多应用中提供了至关重要的视觉提示，包括机器人技术和自动驾驶。由于图像分辨率不佳，遮挡或无法区分的身体部位，很难推断出3D姿势估计时，特别是值得期望的。我们提出了可可 - 莫尔（野生中身体方向的单眼估计），这是一种新的大规模数据集，用于从单个野外图像中进行定向估计。已经使用有效且高精度的注释管道收集了来自可可数据集中55K图像中约130k人体的身体定向标签。我们还验证了数据集的好处。首先，我们表明我们的数据集可以基本上改善人体定向估计模型的性能和鲁棒性，该模型的发展先前受到可用培训数据的规模和多样性的限制。此外，我们提出了一种用于3-D人姿势估计的新型三源解决方案，其中3-D姿势标签，2-D姿势标签和我们的身体取向标签均用于关节训练中。我们的模型明显优于单眼3-D人姿势估计的最先进的双源解决方案，其中训练仅使用3-D姿势标签和2-D姿势标签。这证实了Mebow在3-D人体姿势估计中的重要优势，这尤其吸引人，因为人体取向的每种固定标记成本远远低于3D姿势。这项工作表明，Mebow在应对涉及了解人类行为的现实世界中的挑战方面的潜力很高。这项工作的更多信息可在https://chenyanwu.github.io/mebow/上获得。

Body orientation estimation provides crucial visual cues in many applications, including robotics and autonomous driving. It is particularly desirable when 3-D pose estimation is difficult to infer due to poor image resolution, occlusion or indistinguishable body parts. We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image. The body-orientation labels for around 130K human bodies within 55K images from the COCO dataset have been collected using an efficient and high-precision annotation pipeline. We also validated the benefits of the dataset. First, we show that our dataset can substantially improve the performance and the robustness of a human body orientation estimation model, the development of which was previously limited by the scale and diversity of the available training data. Additionally, we present a novel triple-source solution for 3-D human pose estimation, where 3-D pose labels, 2-D pose labels, and our body-orientation labels are all used in joint training. Our model significantly outperforms state-of-the-art dual-source solutions for monocular 3-D human pose estimation, where training only uses 3-D pose labels and 2-D pose labels. This substantiates an important advantage of MEBOW for 3-D human pose estimation, which is particularly appealing because the per-instance labeling cost for body orientations is far less than that for 3-D poses. The work demonstrates high potential of MEBOW in addressing real-world challenges involving understanding human behaviors. Further information of this work is available at https://chenyanwu.github.io/MEBOW/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题