论文标题
部分可观测时空混沌系统的无模型预测
SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance
论文作者
论文摘要
实时单程3D重建是一个挑战性的问题,尚未解决。尽管最近的端到端方法表现出了令人鼓舞的结果,但由于没有足够的监督忽略了空间细节和过度简化的特征融合而忽略时间提示,因此几乎没有捕获微小的结构和几何边界。为了解决这些问题,我们提出了一个端到端的3D重建网络SST,该网络利用了视觉大量系统中的稀疏估计点作为其他空间指导,并通过新型的交叉模式注意机制融合了时间特征,从而实现了更详细的重建结果。我们提出了一个本地时空融合模块,以从多视图的颜色信息和稀疏的先验中利用更有信息的空间 - 周期性提示,以及一个全球的空间 - 周期性融合模块,以完善来自世界框架模型的本地TSDF量。关于扫描仪和7个扫描的广泛实验表明,SST的表现优于所有最先进的竞争对手,同时保持高推理速度在59 fps处,从而使现实世界应用具有实时要求。
Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods have demonstrated promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D reconstruction network SST, which utilizes Sparse estimated points from visual SLAM system as additional Spatial guidance and fuses Temporal features via a novel cross-modal attention mechanism, achieving more detailed reconstruction results. We propose a Local Spatial-Temporal Fusion module to exploit more informative spatial-temporal cues from multi-view color information and sparse priors, as well a Global Spatial-Temporal Fusion module to refine the local TSDF volumes with the world-frame model from coarse to fine. Extensive experiments on ScanNet and 7-Scenes demonstrate that SST outperforms all state-of-the-art competitors, whilst keeping a high inference speed at 59 FPS, enabling real-world applications with real-time requirements.