为什么量化改善概括：二元重量神经网络的NTK

论文标题

为什么量化改善概括：二元重量神经网络的NTK

Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks

论文作者

Zhang, Kaiqi, Yin, Ming, Wang, Yu-Xiang

论文摘要

量化的神经网络引起了很多关注，因为它们在推断过程中降低了空间和计算复杂性。此外，人们已经有民间传说量化是一种隐性的正规化器，因此可以改善神经网络的普遍性，但是没有现有的工作正式使这种有趣的民间传说形式化。在本文中，我们将神经网络中的二元权重作为随机舍入的随机变量，并研究神经网络中不同层的分布传播。我们提出一个准神经网络来近似分布传播，该分布传播是一个具有连续参数和平滑激活函数的神经网络。我们为该准神经网络得出神经切线核（NTK），并表明NTK的特征值大致衰减，这与随机尺度的高斯内核相当。这反过来表明，与具有实际值重量的二元重量神经网络相比，二元重量神经网络的繁殖核希尔伯特空间（RKHS）涵盖了严格的功能子集。我们使用实验来验证我们提出的准神经网络可以很好地近似二进制重量神经网络。此外，与实际值重量神经网络相比，二元重量神经网络的概括差距较低，这与高斯内核和拉普拉斯内核之间的差异相似。

Quantized neural networks have drawn a lot of attention as they reduce the space and computational complexity during the inference. Moreover, there has been folklore that quantization acts as an implicit regularizer and thus can improve the generalizability of neural networks, yet no existing work formalizes this interesting folklore. In this paper, we take the binary weights in a neural network as random variables under stochastic rounding, and study the distribution propagation over different layers in the neural network. We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function. We derive the neural tangent kernel (NTK) for this quasi neural network, and show that the eigenvalue of NTK decays at approximately exponential rate, which is comparable to that of Gaussian kernel with randomized scale. This in turn indicates that the Reproducing Kernel Hilbert Space (RKHS) of a binary weight neural network covers a strict subset of functions compared with the one with real value weights. We use experiments to verify that the quasi neural network we proposed can well approximate binary weight neural network. Furthermore, binary weight neural network gives a lower generalization gap compared with real value weight neural network, which is similar to the difference between Gaussian kernel and Laplace kernel.

下载PDF全文

下载文献需遵守相关版权规定

论文标题