重量平衡移位缩放器耦合后训练后量化

论文标题

重量平衡移位缩放器耦合后训练后量化

Weight Equalizing Shift Scaler-Coupled Post-training Quantization

论文作者

Oh, Jihun, Lee, SangJeong, Park, Meejeong, Walagaurav, Pooni, Kwon, Kiseok

论文摘要

训练后的层次量化是可取的，因为它没有重新培训，并且对硬件友好。然而，当神经网络模型的每通道重量范围差异很大时，就会发生准确性降解。特别是，在8位重量量化后，Mobilenet家族在Imagenet数据集的TOP-1准确性下降了从70.60％〜71.87％降至0.1％。为了减轻这种显着的准确性降低，我们提出了一个新的重量均衡的移位缩放器，即在层次量化之前，通过4位二进制偏移重新缩放每个通道的重量范围。为了恢复原始的输出范围，在自定义神经处理单元的固定计算卷积运算符中，二进制转移有效地融合到现有的每层尺度上。二进制移位是我们算法的关键特征，它在不阻碍内存足迹的情况下显着提高了准确性性能。结果，我们提出的方法在Mobilenets中获得了69.78％〜70.96％的TOP-1精度，并且在不同的网络模型和任务中表现出强劲的性能，这在渠道量化结果上具有竞争力。

Post-training, layer-wise quantization is preferable because it is free from retraining and is hardware-friendly. Nevertheless, accuracy degradation has occurred when a neural network model has a big difference of per-out-channel weight ranges. In particular, the MobileNet family has a tragedy drop in top-1 accuracy from 70.60% ~ 71.87% to 0.1% on the ImageNet dataset after 8-bit weight quantization. To mitigate this significant accuracy reduction, we propose a new weight equalizing shift scaler, i.e. rescaling the weight range per channel by a 4-bit binary shift, prior to a layer-wise quantization. To recover the original output range, inverse binary shifting is efficiently fused to the existing per-layer scale compounding in the fixed-computing convolutional operator of the custom neural processing unit. The binary shift is a key feature of our algorithm, which significantly improved the accuracy performance without impeding the memory footprint. As a result, our proposed method achieved a top-1 accuracy of 69.78% ~ 70.96% in MobileNets and showed robust performance in varying network models and tasks, which is competitive to channel-wise quantization results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题