CNN（卷积神经网络）压缩的转换量化

论文标题

CNN（卷积神经网络）压缩的转换量化

Transform Quantization for CNN (Convolutional Neural Network) Compression

论文作者

Young, Sean I., Zhe, Wang, Taubman, David, Girod, Bernd

论文摘要

在本文中，我们通过转换量化在训练后压缩卷积神经网络（CNN）权重。以前的CNN量化技术倾向于忽略权重和激活的联合统计数据，在给定的量化比特率下产生亚最佳的CNN性能，或仅在培训期间考虑其联合统计数据，并且不促进已经训练有素的CNN模型的有效压缩。我们使用速率延伸框架在训练后进行最佳转换（去倍率）并量化权重，以在任何给定的量化比特量中改善压缩。变换量化统一了单个框架中的量化和尺寸降低（去相关）技术，以促进CNN的低比率压缩和在变换域中的有效推断。我们首先引入了CNN量化的速率和失真理论，并将最佳量化作为率延伸优化问题。然后，我们证明可以通过我们在本文中得出的最佳端到端学习变换（ELT）去相关后使用最佳的位分配来解决此问题。实验表明，转化量化在CNN压缩和非重新量化量化场景中都可以推进CNN压缩的最新状态。特别是，我们发现使用重新培训的变换量化能够压缩CNN模型，例如Alexnet，Resnet和Densenet，以将其压缩为非常低的位速率（1-2位）。

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).

下载PDF全文

下载文献需遵守相关版权规定

论文标题