涂料：全身3D姿势估计的零件专家的蒸馏

论文标题

涂料：全身3D姿势估计的零件专家的蒸馏

DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild

论文作者

Weinzaepfel, Philippe, Brégier, Romain, Combaluzier, Hadrien, Leroy, Vincent, Rogez, Grégory

论文摘要

我们介绍了Dope，这是第一种检测和估算野外身体，手和脸在内的全身3D姿势的方法。对于需要了解彼此或与环境的人们相互作用的许多应用程序，达到这一级别的细节是关键。主要的挑战是缺乏带有标记为全身3D姿势的野外数据。在先前的工作中，培训数据已被注释或生成，以分别针对身体，手或面部的更简单任务。在这项工作中，我们建议利用这些数据集为每个部分（即身体，手和面部专家）培训独立专家，并将其知识提炼成专为全身2D-3D姿势检测设计的单个深层网络。实际上，鉴于具有部分或没有注释的训练图像，每个部分专家都检测到其在2D和3D中的关键点子集，并且将结果估计合并以获得全身伪造地面真实姿势。蒸馏损失鼓励整体预测，以模仿专家的产出。我们的结果表明，这种方法的表现大大超过了在不蒸馏的情况下训练的同一体型模型，同时保持与专家的表现。重要的是，涂料在计算上的要求少于专家合奏，并且可以实现实时性能。测试代码和模型可在https://europe.naverlabs.com/research/computer-vision/dope上找到。

We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous work, training data has been annotated or generated for simpler tasks focusing on bodies, hands or faces separately. In this work, we propose to take advantage of these datasets to train independent experts for each part, namely a body, a hand and a face expert, and distill their knowledge into a single deep network designed for whole-body 2D-3D pose detection. In practice, given a training image with partial or no annotation, each part expert detects its subset of keypoints in 2D and 3D and the resulting estimations are combined to obtain whole-body pseudo ground-truth poses. A distillation loss encourages the whole-body predictions to mimic the experts' outputs. Our results show that this approach significantly outperforms the same whole-body model trained without distillation while staying close to the performance of the experts. Importantly, DOPE is computationally less demanding than the ensemble of experts and can achieve real-time performance. Test code and models are available at https://europe.naverlabs.com/research/computer-vision/dope.

下载PDF全文

下载文献需遵守相关版权规定

论文标题