VECQ：最小损耗DNN模型压缩，并用矢量重量量化

论文标题

VECQ：最小损耗DNN模型压缩，并用矢量重量量化

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

论文作者

Gong, Cheng, Chen, Yao, Lu, Ye, Li, Tao, Hao, Cong, Chen, Deming

论文摘要

量化已被证明是降低DNN的计算和/或存储成本的有效方法。但是，量化位和最终精度之间的权衡是复杂且非凸的，这使得很难直接进行优化。最小化系数数据的直接量化损失（DQL）是一种有效的局部优化方法，但以前的工作经常忽略对DQL的准确控制，从而导致最终DNN模型精度的损失更高。在本文中，我们提出了一种称为矢量损失的新型指标。基于这个新的指标，我们开发了一种称为VECQ的新量化解决方案，该解决方案可以保证最小的直接量化损失和更好的模型准确性。此外，为了加快模型训练期间提出的量化过程，我们使用参数化的概率估计方法和基于模板的推导计算加速了量化过程。我们评估了有关MNIST，CIFAR，IMAGENET，IMDB电影评论和使用数值DNN模型的Thucnews文本数据集的提议算法。结果表明，我们提出的量化解决方案比最先进的方法更准确，更有效，但具有更灵活的位宽支持。此外，我们对显着对象检测（SOD）任务的量化模型的评估可保持可比的特征提取质量，最多减少16美元$ \ times $。

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, the trade-off between the quantization bitwidth and final accuracy is complex and non-convex, which makes it difficult to be optimized directly. Minimizing direct quantization loss (DQL) of the coefficient data is an effective local optimization method, but previous works often neglect the accurate control of the DQL, resulting in a higher loss of the final DNN model accuracy. In this paper, we propose a novel metric called Vector Loss. Based on this new metric, we develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy. In addition, in order to speed up the proposed quantization process during model training, we accelerate the quantization process with a parameterized probability estimation method and template-based derivation calculation. We evaluate our proposed algorithm on MNIST, CIFAR, ImageNet, IMDB movie review and THUCNews text data sets with numerical DNN models. The results demonstrate that our proposed quantization solution is more accurate and effective than the state-of-the-art approaches yet with more flexible bitwidth support. Moreover, the evaluation of our quantized models on Saliency Object Detection (SOD) tasks maintains comparable feature extraction quality with up to 16$\times$ weight size reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题