posenet3d：通过知识蒸馏学习时间一致的3D人类姿势学习

论文标题

posenet3d：通过知识蒸馏学习时间一致的3D人类姿势学习

PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation

论文作者

Tripathi, Shashank, Ranade, Siddhant, Tyagi, Ambrish, Agrawal, Amit

论文摘要

从2D关节中恢复3D人姿势是一个高度不受约束的问题。我们提出了一个新型的神经网络框架Posenet3D，该框架将2D接头作为输入，并输出3D骨架和SMPL身体模型参数。通过在学生教师框架中施放学习方法，我们避免使用任何3D数据，例如配对/未配对的3D数据，运动捕获序列，深度图像或训练过程中的多视图图像。我们首先训练一个只使用2D姿势进行训练的教师网络，该网络输出3D骨架。教师网络将其知识提炼成预测3D姿势SMPL表示的学生网络。最后，教师和学生网络都使用时间，自我矛盾和对抗性损失以端到端的方式进行微调，从而提高了每个单独网络的准确性。与先前的无监督方法相比，3D人类姿势估计的人类36M数据集的结果表明，我们的方法将3D关节预测误差降低了18％。野外数据集的定性结果表明，恢复的3D姿势和网格是自然，现实且在连续帧上平稳流动的。

Recovering 3D human pose from 2D joints is a highly unconstrained problem. We propose a novel neural network framework, PoseNet3D, that takes 2D joints as input and outputs 3D skeletons and SMPL body model parameters. By casting our learning approach in a student-teacher framework, we avoid using any 3D data such as paired/unpaired 3D data, motion capture sequences, depth images or multi-view images during training. We first train a teacher network that outputs 3D skeletons, using only 2D poses for training. The teacher network distills its knowledge to a student network that predicts 3D pose in SMPL representation. Finally, both the teacher and the student networks are jointly fine-tuned in an end-to-end manner using temporal, self-consistency and adversarial losses, improving the accuracy of each individual network. Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach reduces the 3D joint prediction error by 18% compared to previous unsupervised methods. Qualitative results on in-the-wild datasets show that the recovered 3D poses and meshes are natural, realistic, and flow smoothly over consecutive frames.

下载PDF全文

下载文献需遵守相关版权规定

论文标题