Voxgraf：快速3D感知图像合成，稀疏体素电网

论文标题

Voxgraf：快速3D感知图像合成，稀疏体素电网

VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids

论文作者

Schwarz, Katja, Sauer, Axel, Niemeyer, Michael, Liao, Yiyi, Geiger, Andreas

论文摘要

最新的3D感知生成模型依赖于基于坐标的MLP来参数化3D辐射场。在证明令人印象深刻的结果的同时，请查询MLP对每个射线的每个样品都会导致渲染缓慢。因此，现有方法通常会渲染低分辨率特征图，并使用UPPRAPLING网络处理以获取最终图像。尽管有效的，神经渲染通常纠缠于观点和内容，从而改变相机姿势会导致几何或外观的不必要变化。在基于体素的新型视图合成中的最新结果中，我们研究了本文中稀疏体素电网表示的快速和3D一致生成建模的实用性。我们的结果表明，当将稀疏体素电网与渐进式生长，自由空间修剪和适当的正则化结合时，单层MLP确实可以被3D卷积代替。为了获得场景的紧凑表示并允许缩放到更高的体素分辨率，我们的模型将前景对象（以3D模型）从背景（以2D模型建模）中。与现有方法相反，我们的方法仅需要一个向前传递即可生成完整的3D场景。因此，它允许从任意观点呈现有效渲染，同时以高视觉保真度产生3D一致的结果。

State-of-the-art 3D-aware generative models rely on coordinate-based MLPs to parameterize 3D radiance fields. While demonstrating impressive results, querying an MLP for every sample along each ray leads to slow rendering. Therefore, existing approaches often render low-resolution feature maps and process them with an upsampling network to obtain the final image. Albeit efficient, neural rendering often entangles viewpoint and content such that changing the camera pose results in unwanted changes of geometry or appearance. Motivated by recent results in voxel-based novel view synthesis, we investigate the utility of sparse voxel grid representations for fast and 3D-consistent generative modeling in this paper. Our results demonstrate that monolithic MLPs can indeed be replaced by 3D convolutions when combining sparse voxel grids with progressive growing, free space pruning and appropriate regularization. To obtain a compact representation of the scene and allow for scaling to higher voxel resolutions, our model disentangles the foreground object (modeled in 3D) from the background (modeled in 2D). In contrast to existing approaches, our method requires only a single forward pass to generate a full 3D scene. It hence allows for efficient rendering from arbitrary viewpoints while yielding 3D consistent results with high visual fidelity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题