用于机器视觉的预处理增强图像压缩

论文标题

用于机器视觉的预处理增强图像压缩

Preprocessing Enhanced Image Compression for Machine Vision

论文作者

Lu, Guo, Ge, Xingtong, Zhong, Tianxiong, Geng, Jing, Hu, Qiang

论文摘要

最近，越来越多的图像被压缩并发送到用于机器分析任务的后端设备。但是，大多数传统图像编解码器旨在最大程度地减少人类视觉系统的失真，而无需考虑机器视觉系统的需求增加。在这项工作中，我们为机器视觉任务提出了一种预处理增强的图像压缩方法，以应对这一挑战。我们的框架不是依靠学习的图像编解码器进行端到端优化，而是建立在传统的非差异编解码器上，这意味着它是标准兼容的，并且可以轻松地部署在实际应用中。具体而言，我们在编码器之前提出了一个神经预处理模块，以维护下游任务的有用语义信息，并抑制无关信息以节省比特率。此外，我们的神经预处理模块是量化自适应，可用于不同的压缩比。更重要的是，要通过下游机器视觉任务共同优化预处理模块，我们在后传播阶段介绍了传统非差异编解码器的代理网络。我们通过评估具有不同骨干网络的两个代表性下游任务的压缩方法来提供广泛的实验。实验结果表明，我们的方法通过节省约20％的比特率来实现编码比特率和下游机器视觉任务的性能之间的更折衷。

Recently, more and more images are compressed and sent to the back-end devices for the machine analysis tasks~(\textit{e.g.,} object detection) instead of being purely watched by humans. However, most traditional or learned image codecs are designed to minimize the distortion of the human visual system without considering the increased demand from machine vision systems. In this work, we propose a preprocessing enhanced image compression method for machine vision tasks to address this challenge. Instead of relying on the learned image codecs for end-to-end optimization, our framework is built upon the traditional non-differential codecs, which means it is standard compatible and can be easily deployed in practical applications. Specifically, we propose a neural preprocessing module before the encoder to maintain the useful semantic information for the downstream tasks and suppress the irrelevant information for bitrate saving. Furthermore, our neural preprocessing module is quantization adaptive and can be used in different compression ratios. More importantly, to jointly optimize the preprocessing module with the downstream machine vision tasks, we introduce the proxy network for the traditional non-differential codecs in the back-propagation stage. We provide extensive experiments by evaluating our compression method for two representative downstream tasks with different backbone networks. Experimental results show our method achieves a better trade-off between the coding bitrate and the performance of the downstream machine vision tasks by saving about 20% bitrate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题