论文标题
体现场景 - 意识到的人姿势估计
Embodied Scene-aware Human Pose Estimation
论文作者
论文摘要
我们提出了体面意识的人类姿势估计,我们根据模拟代理的本体感受和场景意识以及外部第三人称观察来估计3D构成。与经常诉诸多阶段优化的先前方法不同,非因果推理和复杂的接触建模以估计人类姿势和人类场景相互作用,我们的方法是一个阶段,因果关系,并在模拟环境中恢复了全局3D人类姿势。由于2D第三人称观察与相机姿势相结合,因此我们建议解开相机姿势,并使用在全球坐标框架中定义的多步投影梯度作为我们体现的代理的运动提示。利用物理模拟和预先扫描的场景(例如3D网格),我们在日常环境(图书馆,办公室,卧室等)中模拟代理,并为我们的代理配备环境传感器,以智能导航和与场景的几何形状进行互动。我们的方法还仅依靠2D关键点,并且可以在来自流行的人类运动数据库的合成数据集上进行培训。为了进行评估,我们使用流行的H36M和Prox数据集,并在挑战性的Prox数据集上实现高质量的姿势估计,而无需使用Prox运动序列进行训练。代码和视频可在项目页面上找到。
We propose embodied scene-aware human pose estimation where we estimate 3D poses based on a simulated agent's proprioception and scene awareness, along with external third-person observations. Unlike prior methods that often resort to multistage optimization, non-causal inference, and complex contact modeling to estimate human pose and human scene interactions, our method is one-stage, causal, and recovers global 3D human poses in a simulated environment. Since 2D third-person observations are coupled with the camera pose, we propose to disentangle the camera pose and use a multi-step projection gradient defined in the global coordinate frame as the movement cue for our embodied agent. Leveraging a physics simulation and prescanned scenes (e.g., 3D mesh), we simulate our agent in everyday environments (library, office, bedroom, etc.) and equip our agent with environmental sensors to intelligently navigate and interact with the geometries of the scene. Our method also relies only on 2D keypoints and can be trained on synthetic datasets derived from popular human motion databases. To evaluate, we use the popular H36M and PROX datasets and achieve high quality pose estimation on the challenging PROX dataset without ever using PROX motion sequences for training. Code and videos are available on the project page.