通过视图对齐方式实现现实的3D嵌入

论文标题

通过视图对齐方式实现现实的3D嵌入

Towards Realistic 3D Embedding via View Alignment

论文作者

Zhang, Changgong, Zhan, Fangneng, Lu, Shijian, Ma, Feiying, Xie, Xuansong

论文摘要

生成对抗网络（GAN）的最新进展已在自动图像组合中取得了巨大成功，该图像组合物通过自动嵌入感兴趣的前景对象来生成新图像。另一方面，大多数现有作品都涉及二维（2D）图像中的前景对象，尽管三维（3D）模型中的前景具有360度的视图自由度更加灵活。本文提出了一个创新的视图对齐gan（VA-GAN），该观点通过将3D模型嵌入2D背景图像中，并自动自动构成新图像。 VA-GAN由纹理发生器和一个相互连接和端到端训练的差分歧视器组成。从背景图像中学习几何变换的差分歧视器指南，以便可以将组成的3D模型与具有逼真的姿势和视图的背景图像对齐。纹理生成器采用了一种新颖的视图编码机制，以在估计的视图下为3D模型生成准确的对象纹理。与最先进的生成方法相比，VA-GAN对两个合成任务进行了广泛的实验（与Kitti和行人合成的CAR合成）表明，Va-Gan具有定性和定量的高效率组成。

Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically. On the other hand, most existing works deal with foreground objects in two-dimensional (2D) images though foreground objects in three-dimensional (3D) models are more flexible with 360-degree view freedom. This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically. VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable. The differential discriminator guides to learn geometric transformation from background images so that the composed 3D models can be aligned with the background images with realistic poses and views. The texture generator adopts a novel view encoding mechanism for generating accurate object textures for the 3D models under the estimated views. Extensive experiments over two synthesis tasks (car synthesis with KITTI and pedestrian synthesis with Cityscapes) show that VA-GAN achieves high-fidelity composition qualitatively and quantitatively as compared with state-of-the-art generation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题