论文标题
图像g符合可区分的呈现图形和可解释的3D神经渲染的渲染
Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering
论文作者
论文摘要
可区分的渲染已为训练神经网络执行“反图形”任务铺平了道路,例如预测单眼照片的3D几何形状。为了培训高性能模型,大多数当前的方法都依赖于多视图图像,而多视图像在实践中不容易获得。相比之下,综合图像的最新生成对抗网络(GAN)似乎在训练过程中隐含地获取3D知识:可以通过简单地操纵潜在代码来操纵对象观点。但是,这些潜在代码通常缺乏进一步的物理解释,因此甘恩(Gans)不能轻易倒置以执行明显的3D推理。在本文中,我们旨在通过利用可区分的渲染器来提取和解开生成模型学到的3D知识。我们方法的关键是利用GAN作为多视图数据生成器,使用现成的可区分渲染器训练逆图形网络,而受过训练的逆图形网络作为教师将GAN的潜在代码置于可解释的3D属性中。使用周期一致性损失对整个体系结构进行迭代训练。我们表明,我们的方法显着优于在定量和通过用户研究中训练在现有数据集中训练的最先进的逆图形网络。我们进一步展示了分离的gan作为可控的3D“神经渲染器”,并补充了传统的图形渲染器。
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks such as predicting 3D geometry from monocular photographs. To train high performing models, most of the current approaches rely on multi-view imagery which are not readily available in practice. Recent Generative Adversarial Networks (GANs) that synthesize images, in contrast, seem to acquire 3D knowledge implicitly during training: object viewpoints can be manipulated by simply manipulating the latent codes. However, these latent codes often lack further physical interpretation and thus GANs cannot easily be inverted to perform explicit 3D reasoning. In this paper, we aim to extract and disentangle 3D knowledge learned by generative models by utilizing differentiable renderers. Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties. The entire architecture is trained iteratively using cycle consistency losses. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets, both quantitatively and via user studies. We further showcase the disentangled GAN as a controllable 3D "neural renderer", complementing traditional graphics renderers.