语义布局操纵，高分辨率稀疏注意

论文标题

语义布局操纵，高分辨率稀疏注意

Semantic Layout Manipulation with High-Resolution Sparse Attention

论文作者

Zheng, Haitian, Lin, Zhe, Lu, Jingwan, Cohen, Scott, Zhang, Jianming, Xu, Ning, Luo, Jiebo

论文摘要

我们解决了语义图像布局操纵的问题，该问题旨在通过编辑其语义标签图来操纵输入图像。此任务的核心问题是如何将视觉详细信息从输入图像传输到新的语义布局，同时使结果图像在视觉上实现。关于学习跨域信函的最新工作显示了具有密集的基于注意力的翘曲的全球布局转移的有希望的结果。但是，由于分辨率限制和对应关系的平滑度约束，该方法倾向于失去纹理细节。为了适应该范式的布局操纵任务，我们提出了一个高分辨率稀疏注意模块，该模块可有效地将视觉细节传输到最高512x512的分辨率。为了进一步提高视觉质量，我们介绍了一种新颖的生成器结构，该结构由语义编码器和用于粗到精细合成的两阶段解码器组成。 ADE20K和Places365数据集的实验表明，我们提出的方法对现有的镶嵌和布局操纵方法实现了重大改进。

We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map. A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic. Recent work on learning cross-domain correspondence has shown promising results for global layout transfer with dense attention-based warping. However, this method tends to lose texture details due to the resolution limitation and the lack of smoothness constraint of correspondence. To adapt this paradigm for the layout manipulation task, we propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512. To further improve visual quality, we introduce a novel generator architecture consisting of a semantic encoder and a two-stage decoder for coarse-to-fine synthesis. Experiments on the ADE20k and Places365 datasets demonstrate that our proposed approach achieves substantial improvements over the existing inpainting and layout manipulation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题