论文标题
CNN(卷积神经网络)压缩的转换量化
Transform Quantization for CNN (Convolutional Neural Network) Compression
论文作者
论文摘要
在本文中,我们通过转换量化在训练后压缩卷积神经网络(CNN)权重。以前的CNN量化技术倾向于忽略权重和激活的联合统计数据,在给定的量化比特率下产生亚最佳的CNN性能,或仅在培训期间考虑其联合统计数据,并且不促进已经训练有素的CNN模型的有效压缩。我们使用速率延伸框架在训练后进行最佳转换(去倍率)并量化权重,以在任何给定的量化比特量中改善压缩。变换量化统一了单个框架中的量化和尺寸降低(去相关)技术,以促进CNN的低比率压缩和在变换域中的有效推断。我们首先引入了CNN量化的速率和失真理论,并将最佳量化作为率延伸优化问题。然后,我们证明可以通过我们在本文中得出的最佳端到端学习变换(ELT)去相关后使用最佳的位分配来解决此问题。实验表明,转化量化在CNN压缩和非重新量化量化场景中都可以推进CNN压缩的最新状态。特别是,我们发现使用重新培训的变换量化能够压缩CNN模型,例如Alexnet,Resnet和Densenet,以将其压缩为非常低的位速率(1-2位)。
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).