从布局中以对象为中心的图像生成

论文标题

从布局中以对象为中心的图像生成

Object-Centric Image Generation from Layouts

论文作者

Sylvain, Tristan, Zhang, Pengchuan, Bengio, Yoshua, Hjelm, R Devon, Sharma, Shikhar

论文摘要

尽管最近对单对象和单域图像产生的结果令人印象深刻，但具有多个对象的复杂场景的产生仍然具有挑战性。在本文中，我们从一个模型必须能够理解对象之间的单个对象和关系的想法开始，以便很好地生成复杂的场景。我们称之为以对象为中心的生成对抗网络（或OC-GAN）的布局到图像生成方法依赖于新颖的场景图形模块（SGSM）。 SGSM了解场景中对象之间空间关系的表示，从而导致我们的模型改进的布局前景。我们还提出了对生成器的调节机制的更改，以增强其对象实例意识。除了提高图像质量外，我们的贡献还可以减轻以前方法中的两种故障模式：（1）生成虚假对象，而没有布局中的相应边界框，以及（2）在布局中重叠的边界框，导致图像中的合并对象。广泛的定量评估和消融研究证明了我们的贡献的影响，我们的模型在可可固定和视觉基因组数据集上都超过了先前的最新方法。最后，我们通过引入SceneFID（以对象为中心的Fr {é} Chet Inception距离度量指标，对先前作品中使用的评估指标的重要局限性，更适合多对象图像。

Despite recent impressive results on single-object and single-domain image generation, the generation of complex scenes with multiple objects remains challenging. In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well. Our layout-to-image-generation method, which we call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We also propose changes to the conditioning mechanism of the generator that enhance its object instance-awareness. Apart from improving image quality, our contributions mitigate two failure modes in previous approaches: (1) spurious objects being generated without corresponding bounding boxes in the layout, and (2) overlapping bounding boxes in the layout leading to merged objects in images. Extensive quantitative evaluation and ablation studies demonstrate the impact of our contributions, with our model outperforming previous state-of-the-art approaches on both the COCO-Stuff and Visual Genome datasets. Finally, we address an important limitation of evaluation metrics used in previous works by introducing SceneFID -- an object-centric adaptation of the popular Fr{é}chet Inception Distance metric, that is better suited for multi-object images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题