向量量化的变异自动编码器中向量颜色空间的效用

论文标题

向量量化的变异自动编码器中向量颜色空间的效用

The Utility of Decorrelating Colour Spaces in Vector Quantised Variational Autoencoders

论文作者

Akbarinia, Arash, Gil-Rodríguez, Raquel, Flachot, Alban, Toscani, Matteo

论文摘要

矢量量化的变分自动编码器（VQ-VAE）的特征是三个主要组件：1）编码视觉数据，2）在所谓的嵌入空间中分配$ k $不同的向量，而3）解码学到的功能。虽然图像通常在RGB颜色空间中表示，但其他空间中的特定颜色组织也提供了有趣的功能，例如CIE L*a*b*将色度变形为对手轴。在本文中，我们提出了颜色空间转换，这是一个简单的准无人看管的任务，以实施网络学习结构化表示形式。为此，我们训练了几个VQ-VAE的实例，其输入是一个颜色空间中的图像，其输出在另一个颜色空间中，例如从RGB到CIE L*A*B*（总共考虑了五个颜色空间）。我们检查了训练有素的网络的有限嵌入空间，以便将VQ-VAE模型中的颜色表示。我们的分析表明，某些向量编码色调和其他亮度信息。我们使用像素颜色指标进一步评估了低级重建图像的质量，并通过将它们输入图像分类和场景分割网络来评估高级颜色指标。我们在三个基准数据集中进行了实验：Imagenet，Coco和Celeba。我们的结果表明，关于基线网络（其输入和输出为RGB），颜色转换为去相关的空间获得了1-2个delta-e较低的颜色差异，分类精度提高了5-10％。我们还观察到，在色彩对手模型中，学到的嵌入空间更容易解释。

Vector quantised variational autoencoders (VQ-VAE) are characterised by three main components: 1) encoding visual data, 2) assigning $k$ different vectors in the so-called embedding space, and 3) decoding the learnt features. While images are often represented in RGB colour space, the specific organisation of colours in other spaces also offer interesting features, e.g. CIE L*a*b* decorrelates chromaticity into opponent axes. In this article, we propose colour space conversion, a simple quasi-unsupervised task, to enforce a network learning structured representations. To this end, we trained several instances of VQ-VAE whose input is an image in one colour space, and its output in another, e.g. from RGB to CIE L*a*b* (in total five colour spaces were considered). We examined the finite embedding space of trained networks in order to disentangle the colour representation in VQ-VAE models. Our analysis suggests that certain vectors encode hue and others luminance information. We further evaluated the quality of reconstructed images at low-level using pixel-wise colour metrics, and at high-level by inputting them to image classification and scene segmentation networks. We conducted experiments in three benchmark datasets: ImageNet, COCO and CelebA. Our results show, with respect to the baseline network (whose input and output are RGB), colour conversion to decorrelated spaces obtains 1-2 Delta-E lower colour difference and 5-10% higher classification accuracy. We also observed that the learnt embedding space is easier to interpret in colour opponent models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题