通过渠道的自动回归熵模型用于学习的图像压缩

论文标题

通过渠道的自动回归熵模型用于学习的图像压缩

Channel-wise Autoregressive Entropy Models for Learned Image Compression

论文作者

Minnen, David, Singh, Saurabh

论文摘要

在基于学习的图像压缩方法中，编解码器是通过优化计算模型来最大程度地降低利率延伸目标来开发的。当前，最有效的学术图像编解码器采用熵受限的自动编码器的形式，其熵模型同时使用前向和向后适应。正向适应利用侧面信息，可以有效地集成到深度神经网络中。相反，向后的适应通常基于每个符号的因果上下文做出预测，这需要串行处理，以防止有效的GPU / TPU利用率。我们介绍了两种增强功能，即渠道条件和潜在剩余预测，它们导致与现有上下文自适应模型相比，具有更好的速率降低性能的网络体系结构，同时最大程度地减少了串行处理。从经验上讲，与上下文自适应基线模型相比，我们看到柯达图像集的平均速率节省为6.7％，而Tecnick图像集的平均率为11.4％。在低比特率的情况下，在最有效的情况下，我们的模型可节省高达18％的基线，并且胜过BPG（例如BPG）的手工设计的编解码器高达25％。

In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective. Currently, the most effective learned image codecs take the form of an entropy-constrained autoencoder with an entropy model that uses both forward and backward adaptation. Forward adaptation makes use of side information and can be efficiently integrated into a deep neural network. In contrast, backward adaptation typically makes predictions based on the causal context of each symbol, which requires serial processing that prevents efficient GPU / TPU utilization. We introduce two enhancements, channel-conditioning and latent residual prediction, that lead to network architectures with better rate-distortion performance than existing context-adaptive models while minimizing serial processing. Empirically, we see an average rate savings of 6.7% on the Kodak image set and 11.4% on the Tecnick image set compared to a context-adaptive baseline model. At low bit rates, where the improvements are most effective, our model saves up to 18% over the baseline and outperforms hand-engineered codecs like BPG by up to 25%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题