Latents2Segments：解开生成模型的潜在空间，用于面部图像的语义分割

论文标题

Latents2Segments：解开生成模型的潜在空间，用于面部图像的语义分割

Latents2Segments: Disentangling the Latent Space of Generative Models for Semantic Segmentation of Face Images

论文作者

Tomar, Snehal Singh, Rajagopalan, A. N.

论文摘要

随着越来越多的增强和虚拟现实应用程序的出现，旨在对人脸的图像进行有意义和控制的样式编辑，因此，分析面部图像以产生准确且细粒度的语义细分图的动力比以往任何时候都更加多。很少有解决这个问题的最新技术（SOTA）方法，通过在面部结构或其他面部属性（例如表达和姿势）中纳入先验，在其深层分类器结构中进行融合。我们在这项工作中的努力是消除SOTA多级面部面部分割模型所需的复杂的预处理操作，通过将这项操作重新构架为在面部义务区域（ROIS）的下游任务后，将此操作作为下游任务（ROIS）的下游任务（ROIS）的潜在自动范围模型的潜在空间。我们为模型在Celebamask-HQ和Helen数据集上的性能提供了结果。与其他SOTA作品相比，我们模型的编码潜在空间在语义ROI方面的分离明显更高。此外，对于公开可用的SOTA，它在面部图像的下游任务方面达到了更快的推理率和可比的精度。

With the advent of an increasing number of Augmented and Virtual Reality applications that aim to perform meaningful and controlled style edits on images of human faces, the impetus for the task of parsing face images to produce accurate and fine-grained semantic segmentation maps is more than ever before. Few State of the Art (SOTA) methods which solve this problem, do so by incorporating priors with respect to facial structure or other face attributes such as expression and pose in their deep classifier architecture. Our endeavour in this work is to do away with the priors and complex pre-processing operations required by SOTA multi-class face segmentation models by reframing this operation as a downstream task post infusion of disentanglement with respect to facial semantic regions of interest (ROIs) in the latent space of a Generative Autoencoder model. We present results for our model's performance on the CelebAMask-HQ and HELEN datasets. The encoded latent space of our model achieves significantly higher disentanglement with respect to semantic ROIs than that of other SOTA works. Moreover, it achieves a 13% faster inference rate and comparable accuracy with respect to the publicly available SOTA for the downstream task of semantic segmentation of face images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题