通过神经领域的3D场景理解的全面举重

论文标题

通过神经领域的3D场景理解的全面举重

Panoptic Lifting for 3D Scene Understanding with Neural Fields

论文作者

Siddiqui, Yawar, Porzi, Lorenzo, Buló, Samuel Rota, Müller, Norman, Nießner, Matthias, Dai, Angela, Kontschieder, Peter

论文摘要

我们提出了Panoptic Lifting，这是一种从野外场景的图像中学习全景3D体积表示的新方法。经过训练后，我们的模型可以从新的观点中呈现颜色图像以及3D一致的全景分割。与直接或间接使用3D输入的现有方法不同，我们的方法仅需要从预先训练的网络推断出的机器生成的2D Panoptic分割掩码。我们的核心贡献是一种基于神经场表示的全面提升方案，该方案产生了场景的统一和多视图一致的3D景观表示。为了说明跨视图的2D实例标识符的不一致之处，我们根据模型的当前预测和机器生成的分割掩码求解了一个线性分配，从而使我们能够以一致的方式将2D实例提高到3D。我们进一步提出和消融贡献，使我们的方法对嘈杂，机器生成的标签更强大，包括用于置信度估计，细分一致性损失，有限的分割场和梯度停止的测试时间增加。实验结果验证了我们对具有挑战性的Hypersim，副本和扫描仪数据集的方法，在场景级别的PQ中提高了8.4％，13.8％和10.6％的pq。

We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Our core contribution is a panoptic lifting scheme based on a neural field representation that generates a unified and multi-view consistent, 3D panoptic representation of the scene. To account for inconsistencies of 2D instance identifiers across views, we solve a linear assignment with a cost based on the model's current predictions and the machine-generated segmentation masks, thus enabling us to lift 2D instances to 3D in a consistent way. We further propose and ablate contributions that make our method more robust to noisy, machine-generated labels, including test-time augmentations for confidence estimates, segment consistency loss, bounded segmentation fields, and gradient stopping. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level PQ over state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题