论文标题
人网恢复多次镜头
Human Mesh Recovery from Multiple Shots
论文作者
论文摘要
来自电影(例如电影)的视频是一种有用但探索的信息来源。在这些电影中,在较大的时间环境中描绘的人类之间的丰富种类的外观和相互作用可能是一个宝贵的数据来源。但是,数据的丰富性是以基本挑战为代价的,例如突然的射击变化和截断性截断的演员的镜头,这限制了现有的人类3D理解方法的适用性。在本文中,我们通过洞察力解决了这些局限性,尽管镜头的镜头变化会导致框架之间的不连续性,但场景的3D结构仍然会顺利进行。这使我们能够在射击更改之前和之后处理帧作为多视图信号,该信号提供了强大的提示来恢复参与者的3D状态。我们提出了一个多拍优化框架,该框架可以改善3D重建和使用伪地面真相3D人网的长序列挖掘。我们表明,由此产生的数据在培训各种人类网格恢复模型中是有益的:对于单个图像,我们可以提高鲁棒性;对于视频,我们提出了一个基于纯变压器的时间编码器,该编码器自然可以处理由于输入帧的射击变化而导致的缺失观测值。我们通过广泛的实验证明了洞察力和拟议模型的重要性。我们开发的工具打开了从大型编辑媒体库中处理和分析3D内容的大门,这可能有助于许多下游应用程序。项目页面:https://geopavlakos.github.io/multishot
Videos from edited media like movies are a useful, yet under-explored source of information. The rich variety of appearance and interactions between humans depicted over a large temporal context in these films could be a valuable source of data. However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing human 3D understanding methods. In this paper, we address these limitations with an insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. This allows us to handle frames before and after the shot change as multi-view signal that provide strong cues to recover the 3D state of the actors. We propose a multi-shot optimization framework, which leads to improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh. We show that the resulting data is beneficial in the training of various human mesh recovery models: for single image, we achieve improved robustness; for video we propose a pure transformer-based temporal encoder, which can naturally handle missing observations due to shot changes in the input frames. We demonstrate the importance of the insight and proposed models through extensive experiments. The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications. Project page: https://geopavlakos.github.io/multishot