RTMV：射线追踪的多视图合成数据集用于新型视图合成

论文标题

RTMV：射线追踪的多视图合成数据集用于新型视图合成

RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

论文作者

Tremblay, Jonathan, Meshry, Moustafa, Evans, Alex, Kautz, Jan, Keller, Alexander, Khamis, Sameh, Müller, Thomas, Loop, Charles, Morrical, Nathan, Nagano, Koki, Takikawa, Towaki, Birchfield, Stan

论文摘要

我们提出了一个大型合成数据集，用于新型视图合成，该数据集由使用高分辨率（1600 x 1600像素）的高质量射线追踪从近2000个复数场景中呈现的〜300K图像组成。该数据集的数量级比现有的合成数据集大的数量级以进行新的视图合成，因此为培训和评估提供了较大的统一基准。使用4个不同的高质量3D网眼来源，我们数据集的场景在相机视图，照明，形状，材料和纹理方面表现出挑战性的变化。由于我们的数据集太大，无法处理现有方法，因此我们提出了稀疏体素光场（SVLF），这是一种基于Voxel的基于Voxel的光场方法，用于新的视图合成，在合成数据上与NERF相当地表现可比性，同时是训练速度更快的阶数和两个数量级的数量级，以延长施用器。 SVLF通过依靠稀疏的体素OCTRE，仔细的体素采样（每射线需要少量查询）和减少网络结构来实现此速度；以及训练时地面真相深度地图。我们的数据集由基于Python的射线跟踪渲染器NVISII生成，该渲染器设计为简单，对于非专家使用和共享，通过使用脚本来使用脚本，并能够创建高质量和基于物理的渲染图像，灵活而强大。具有数据集子集的实验使我们能够比较单场模型的NERF和MIP-NERF等标准方法，以及用于类别级建模的Pixelnerf，指出了该领域未来改进的需求。

We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures. Because our dataset is too large for existing methods to process, we propose Sparse Voxel Light Field (SVLF), an efficient voxel-based light field approach for novel view synthesis that achieves comparable performance to NeRF on synthetic data, while being an order of magnitude faster to train and two orders of magnitude faster to render. SVLF achieves this speed by relying on a sparse voxel octree, careful voxel sampling (requiring only a handful of queries per ray), and reduced network structure; as well as ground truth depth maps at training time. Our dataset is generated by NViSII, a Python-based ray tracing renderer, which is designed to be simple for non-experts to use and share, flexible and powerful through its use of scripting, and able to create high-quality and physically-based rendered images. Experiments with a subset of our dataset allow us to compare standard methods like NeRF and mip-NeRF for single-scene modeling, and pixelNeRF for category-level modeling, pointing toward the need for future improvements in this area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题