论文标题
软阈值三元网络
Soft Threshold Ternary Networks
论文作者
论文摘要
由于密集的计算和存储,大型神经网络很难在移动设备上部署。为了减轻它,我们研究三元化,效率与准确性之间的平衡,将权重和激活量化为三元值。在先前的三元化神经网络中,引入了硬阈值δ来确定量化间隔。尽管δ的选择极大地影响了训练结果,但先前的作品通过近似值估计δ或将其视为超优势的超参数。在本文中,我们介绍了软阈值三元网络(STTN),该网络使模型能够自动确定量化间隔,而不是依赖于硬阈值。具体而言,我们在训练时添加两个二元内核代替了原始的三元内核,其中三元值是通过两个相应的二进制值组合来确定的。在推理时,我们添加了两个二进制内核,以获得一个三元内核。我们的方法极大地胜过当前最新的方法,从而降低了全精度网络和极端低位网络之间的性能差距。具有RESNET-18(TOP-1 66.2%)的Imagenet上的实验可实现新的最先进。 更新:在此版本中,我们进一步调整了实验性超参数和培训程序。最新的STTN表明,具有三元重量和三元激活的RESNET-18可在Imagenet上达到高达68.2%的TOP-1精度。代码可在以下网址找到:github.com/weixiangxu/sttn。
Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. In previous ternarized neural networks, a hard threshold Δ is introduced to determine quantization intervals. Although the selection of Δ greatly affects the training results, previous works estimate Δ via an approximation or treat it as a hyper-parameter, which is suboptimal. In this paper, we present the Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine quantization intervals instead of depending on a hard threshold. Concretely, we replace the original ternary kernel with the addition of two binary kernels at training time, where ternary values are determined by the combination of two corresponding binary values. At inference time, we add up the two binary kernels to obtain a single ternary kernel. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and extreme low bit networks. Experiments on ImageNet with ResNet-18 (Top-1 66.2%) achieves new state-of-the-art. Update: In this version, we further fine-tune the experimental hyperparameters and training procedure. The latest STTN shows that ResNet-18 with ternary weights and ternary activations achieves up to 68.2% Top-1 accuracy on ImageNet. Code is available at: github.com/WeixiangXu/STTN.