贝叶斯深度学习和概率的概括视角

论文标题

贝叶斯深度学习和概率的概括视角

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

论文作者

Wilson, Andrew Gordon, Izmailov, Pavel

论文摘要

贝叶斯方法的关键区别特性是边缘化，而不是使用单个权重。贝叶斯边缘化特别可以提高现代深神网络的准确性和校准，这些神经网络通常由数据指定，并且可以代表许多引人注目但不同的解决方案。我们表明，深度集合为近似贝叶斯边缘化提供了有效的机制，并提出了一种相关方法，该方法通过在吸引人的盆地内边缘化而没有明显的开销来进一步改善预测分布。我们还研究了神经网络权重上模糊分布所隐含的先前的过度功能，从概率的角度解释了此类模型的概括。从这个角度来看，我们解释了已表现为神秘且与神经网络概括不同的结果，例如具有随机标签拟合图像的能力，并证明可以通过高斯过程复制这些结果。我们还表明，贝叶斯模型平均减轻双重下降，从而随着灵活性的提高而产生单调性能的提高。最后，我们为校准预测分布提供了贝叶斯的视角。

The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题