论文标题
使用贝叶斯学习规则培训二进制神经网络
Training Binary Neural Networks using the Bayesian Learning Rule
论文作者
论文摘要
具有二进制重量的神经网络是计算效率且对硬件友好的,但是它们的培训具有挑战性,因为它涉及离散的优化问题。令人惊讶的是,忽略了问题的离散性质,并使用基于梯度的方法(例如直通估计器)在实践中仍然效果很好。这提出了一个问题:是否有原则性的方法可以证明这种方法合理?在本文中,我们使用贝叶斯学习规则提出了这种方法。该规则应用于估计二进制重量上的伯努利分布时,会导致算法,这证明了以前方法做出的一些算法选择合理。该算法不仅获得了最先进的性能,而且还可以估算不确定性的持续学习,以避免灾难性的遗忘。我们的工作为培训二进制神经网络提供了一种原则性的方法,该方法可以证明并扩展现有方法。
Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. Surprisingly, ignoring the discrete nature of the problem and using gradient-based methods, such as the Straight-Through Estimator, still works well in practice. This raises the question: are there principled approaches which justify such methods? In this paper, we propose such an approach using the Bayesian learning rule. The rule, when applied to estimate a Bernoulli distribution over the binary weights, results in an algorithm which justifies some of the algorithmic choices made by the previous approaches. The algorithm not only obtains state-of-the-art performance, but also enables uncertainty estimation for continual learning to avoid catastrophic forgetting. Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.