论文标题
通过信息瓶颈了解二元神经网络的学习动态
Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck
论文作者
论文摘要
紧凑的神经网络对于负担得起和有效的深度学习解决方案至关重要。二进制神经网络(BNNS)通过将权重和激活约束到两个级别,$ \ {+1,-1 \} $,将其压实到极端。但是,由于激活功能的不连续性,训练BNN并不容易,而BNN的训练动力学也不是很好的。在本文中,我们介绍了BNN培训的信息理论观点。我们通过信息瓶颈原则分析了BNN,并观察到BNN的训练动力与深神经网络(DNN)的训练动力大不相同。尽管DNN具有单独的经验风险最小化和表示压缩阶段,但我们的数值实验表明,在BNN中,这两个阶段都是同时进行的。由于BNN的表达能力较低,因此他们倾向于与标签拟合同时找到有效的隐藏表示形式。多个数据集中的实验支持这些观察结果,我们看到BNN中不同激活函数的行为一致。
Compact neural networks are essential for affordable and power efficient deep learning solutions. Binary Neural Networks (BNNs) take compactification to the extreme by constraining both weights and activations to two levels, $\{+1, -1\}$. However, training BNNs are not easy due to the discontinuity in activation functions, and the training dynamics of BNNs is not well understood. In this paper, we present an information-theoretic perspective of BNN training. We analyze BNNs through the Information Bottleneck principle and observe that the training dynamics of BNNs is considerably different from that of Deep Neural Networks (DNNs). While DNNs have a separate empirical risk minimization and representation compression phases, our numerical experiments show that in BNNs, both these phases are simultaneous. Since BNNs have a less expressive capacity, they tend to find efficient hidden representations concurrently with label fitting. Experiments in multiple datasets support these observations, and we see a consistent behavior across different activation functions in BNNs.