Frodo：从检测到3D对象

论文标题

Frodo：从检测到3D对象

FroDO: From Detections to 3D Objects

论文作者

Li, Kejie, Rünz, Martin, Tang, Meng, Ma, Lingni, Kong, Chen, Schmidt, Tanner, Reid, Ian, Agapito, Lourdes, Straub, Julian, Lovegrove, Steven, Newcombe, Richard

论文摘要

面向对象的地图对于场景理解很重要，因为它们共同捕获几何学和语义，允许单个实例化和关于对象的有意义的推理。我们介绍了Frodo，这是一种从RGB视频中准确3D重建对象实例的方法，它以粗略的方式缩小对象位置，姿势和形状。 Frodo的关键是将对象形状嵌入一个新颖的学习空间中，该空间允许在稀疏点云和密集的DeepSDF解码之间进行无缝切换。给定局部RGB帧的输入序列，Frodo首先汇总2D检测，以实例化每个对象的类别感知3D边界框。使用编码器网络对形状代码进行回归，然后使用稀疏和致密的形状表示，在学识渊博的形状先验下优化形状并进一步姿势。优化使用多视图几何，光度法和轮廓损失。我们在包括Pix3D，Redwood-OS和Scannet在内的现实世界数据集上评估了单视图，多视图和多对象重建。

Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel learnt space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a category-aware 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse and dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题