论文标题

重量平衡移位缩放器耦合后训练后量化

Weight Equalizing Shift Scaler-Coupled Post-training Quantization

论文作者

Oh, Jihun, Lee, SangJeong, Park, Meejeong, Walagaurav, Pooni, Kwon, Kiseok

论文摘要

训练后的层次量化是可取的,因为它没有重新培训,并且对硬件友好。然而,当神经网络模型的每通道重量范围差异很大时,就会发生准确性降解。特别是,在8位重量量化后,Mobilenet家族在Imagenet数据集的TOP-1准确性下降了从70.60%〜71.87%降至0.1%。为了减轻这种显着的准确性降低,我们提出了一个新的重量均衡的移位缩放器,即在层次量化之前,通过4位二进制偏移重新缩放每个通道的重量范围。为了恢复原始的输出范围,在自定义神经处理单元的固定计算卷积运算符中,二进制转移有效地融合到现有的每层尺度上。二进制移位是我们算法的关键特征,它在不阻碍内存足迹的情况下显着提高了准确性性能。结果,我们提出的方法在Mobilenets中获得了69.78%〜70.96%的TOP-1精度,并且在不同的网络模型和任务中表现出强劲的性能,这在渠道量化结果上具有竞争力。

Post-training, layer-wise quantization is preferable because it is free from retraining and is hardware-friendly. Nevertheless, accuracy degradation has occurred when a neural network model has a big difference of per-out-channel weight ranges. In particular, the MobileNet family has a tragedy drop in top-1 accuracy from 70.60% ~ 71.87% to 0.1% on the ImageNet dataset after 8-bit weight quantization. To mitigate this significant accuracy reduction, we propose a new weight equalizing shift scaler, i.e. rescaling the weight range per channel by a 4-bit binary shift, prior to a layer-wise quantization. To recover the original output range, inverse binary shifting is efficiently fused to the existing per-layer scale compounding in the fixed-computing convolutional operator of the custom neural processing unit. The binary shift is a key feature of our algorithm, which significantly improved the accuracy performance without impeding the memory footprint. As a result, our proposed method achieved a top-1 accuracy of 69.78% ~ 70.96% in MobileNets and showed robust performance in varying network models and tasks, which is competitive to channel-wise quantization results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源