通过潜在能量传输的不配对图像到图像翻译

论文标题

通过潜在能量传输的不配对图像到图像翻译

Unpaired Image-to-Image Translation via Latent Energy Transport

论文作者

Zhao, Yang, Chen, Changyou

论文摘要

图像到图像翻译旨在保留源内容，同时转化为两个视觉域之间的判别目标样式。大多数作品都在环境图像领域应用对抗性学习，这在计算上可能是昂贵且挑战的。在本文中，我们建议将基于能量的模型（EBM）部署在预处理的自动编码器的潜在空间中，以完成此任务。预处理的自动编码器既可以用作潜在的代码提取器，又是图像重建工作者。我们的模型letit是基于以下假设：两个域共享相同的潜在空间，其中潜在表示为内容代码和特定于域的样式代码。我们的潜在EBM无需明确提取两个代码并应用自适应实例归一化以组合它们，而可以隐含地学习将源样式代码传输到目标样式代码的同时，同时保留内容代码，而不是现有图像翻译方法的优势。这种简化的解决方案在单面未配对的图像翻译设置中也更有效。定性和定量比较表明了较高的翻译质量和忠实于内容保存。我们的模型是第一个适用于1024 $ \ times $ 1024分辨率的未配对图像翻译的模型。

Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains. Most works apply adversarial learning in the ambient image space, which could be computationally expensive and challenging to train. In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task. The pretrained autoencoder serves as both a latent code extractor and an image reconstruction worker. Our model, LETIT, is based on the assumption that two domains share the same latent space, where latent representation is implicitly decomposed as a content code and a domain-specific style code. Instead of explicitly extracting the two codes and applying adaptive instance normalization to combine them, our latent EBM can implicitly learn to transport the source style code to the target style code while preserving the content code, an advantage over existing image translation methods. This simplified solution is also more efficient in the one-sided unpaired image translation setting. Qualitative and quantitative comparisons demonstrate superior translation quality and faithfulness for content preservation. Our model is the first to be applicable to 1024$\times$1024-resolution unpaired image translation to the best of our knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题