论文标题

BiqGemm:基于二进制编码的量化DNN的矩阵乘法

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

论文作者

Jeon, Yongkweon, Park, Baeseong, Kwon, Se Jung, Kim, Byeongwook, Yun, Jeongin, Lee, Dongsoo

论文摘要

深神经网络(DNN)中参数的数量正在迅速增加,以支持复杂的任务并提高模型的准确性。相应地,计算和所需的内存足迹也增加了。量化是一种有效的方法,可以通过压缩DNN来解决此类问题,从而在所需的存储足迹大大降低时可以简化计算。不幸的是,商业CPU和GPU不能完全支持量化,因为仅允许固定数据传输(例如32位)。结果,即使将权重量化为几位,CPU和GPU也无法访问没有内存带宽浪费的多个量化权重。因此,实践中的量化成功依赖于有效的计算引擎设计,尤其是在大多数DNN中是基本计算引擎的矩阵乘法。在本文中,我们提出了一种新型的矩阵乘法方法,称为BiqGemm,用于量化DNN。 BiqGemm可以在一项指令中同时访问多个量化权重。此外,BiqGemm预先计算的中间结果在量化导致可用的计算空间有限时高度冗余。由于预计的值存储在查找表中并重复使用,因此BiqGemm的总体计算量较低。我们广泛的实验结果表明,在量化DNN时,BiqGemm比常规方案提出的性能更高。

The number of parameters in deep neural networks (DNNs) is rapidly increasing to support complicated tasks and to improve model accuracy. Correspondingly, the amount of computations and required memory footprint increase as well. Quantization is an efficient method to address such concerns by compressing DNNs such that computations can be simplified while required storage footprint is significantly reduced. Unfortunately, commercial CPUs and GPUs do not fully support quantization because only fixed data transfers (such as 32 bits) are allowed. As a result, even if weights are quantized into a few bits, CPUs and GPUs cannot access multiple quantized weights without memory bandwidth waste. Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs. In this paper, we propose a novel matrix multiplication method, called BiQGEMM, dedicated to quantized DNNs. BiQGEMM can access multiple quantized weights simultaneously in one instruction. In addition, BiQGEMM pre-computes intermediate results that are highly redundant when quantization leads to limited available computation space. Since pre-computed values are stored in lookup tables and reused, BiQGEMM achieves lower amount of overall computations. Our extensive experimental results show that BiQGEMM presents higher performance than conventional schemes when DNNs are quantized.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源