内核量化以进行有效的网络压缩

论文标题

内核量化以进行有效的网络压缩

Kernel Quantization for Efficient Network Compression

论文作者

Yu, Zhongzhi, Shi, Yemin, Huang, Tiejun, Yu, Yizhou

论文摘要

本文提出了一种新型的网络压缩框架内核量化（KQ），其目标是有效地将任何预训练的全精度卷积神经网络（CNN）模型转换为低精确版本，而没有大量性能损失。与现有与重量位斗争的方法不同，KQ具有通过将卷积内核作为量化单元来改善压缩比的潜力。受到从修剪到过滤修剪的演变的启发，我们建议在内核和重量水平上进行量化。我们没有用低位索引来表示每个权重参数，而是学习一个内核代码簿，并用相应的低位索引替换卷积层中的所有内核。因此，KQ可以用低位索引和具有有限尺寸的内核代码书代表卷积层中的重量张量，这使KQ能够达到显着的压缩比。然后，我们在内核代码簿上进行了6位参数量化，以进一步降低冗余。关于Imagenet分类任务的广泛实验证明，KQ在VGG和RESNET18中平均需要1.05和1.62位，以表示卷积层中的每个参数，并实现最先进的压缩比，而精度损失很少。

This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Unlike existing methods struggling with weight bit-length, KQ has the potential in improving the compression ratio by considering the convolution kernel as the quantization unit. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Instead of representing each weight parameter with a low-bit index, we learn a kernel codebook and replace all kernels in the convolution layer with corresponding low-bit indexes. Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio. Then, we conduct a 6-bit parameter quantization on the kernel codebook to further reduce redundancy. Extensive experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer and achieves the state-of-the-art compression ratio with little accuracy loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题