论文标题
Frodo:从检测到3D对象
FroDO: From Detections to 3D Objects
论文作者
论文摘要
面向对象的地图对于场景理解很重要,因为它们共同捕获几何学和语义,允许单个实例化和关于对象的有意义的推理。我们介绍了Frodo,这是一种从RGB视频中准确3D重建对象实例的方法,它以粗略的方式缩小对象位置,姿势和形状。 Frodo的关键是将对象形状嵌入一个新颖的学习空间中,该空间允许在稀疏点云和密集的DeepSDF解码之间进行无缝切换。给定局部RGB帧的输入序列,Frodo首先汇总2D检测,以实例化每个对象的类别感知3D边界框。使用编码器网络对形状代码进行回归,然后使用稀疏和致密的形状表示,在学识渊博的形状先验下优化形状并进一步姿势。优化使用多视图几何,光度法和轮廓损失。我们在包括Pix3D,Redwood-OS和Scannet在内的现实世界数据集上评估了单视图,多视图和多对象重建。
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel learnt space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a category-aware 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse and dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction.