论文标题
宽平均贝叶斯神经网络忽略了数据
Wide Mean-Field Bayesian Neural Networks Ignore the Data
论文作者
论文摘要
贝叶斯神经网络(BNNS)将深度学习的表现力与贝叶斯形式主义的优势相结合。近年来,对宽阔,深bnns的分析为他们的先验和后代提供了理论上的见解。但是,在大致推论下,我们对他们的后代没有类似的了解。在这项工作中,我们表明,当网络宽度较大并且激活函数奇怪时,平均场变异推理完全无法对数据进行建模。具体而言,对于具有奇数激活函数的完全连接的BNN和同型高斯的可能性,我们表明,随着宽度的宽度趋向于无限,最佳的均值均值后验后验预测(即功能空间)分布会收敛到先前的预测分布。我们将此结果的各个方面推广到其他可能性。我们的理论结果暗示了先前在BNN中观察到的不足行为。尽管我们的收敛范围是非征服的,并且可以计算分析中的常数,但它们目前太松了一下,无法适用于标准培训方案。最后,我们表明,如果激活函数不奇怪,则最佳的后验不需要趋向于先验,这表明我们的陈述不能被任意概括。
Bayesian neural networks (BNNs) combine the expressive power of deep learning with the advantages of Bayesian formalism. In recent years, the analysis of wide, deep BNNs has provided theoretical insight into their priors and posteriors. However, we have no analogous insight into their posteriors under approximate inference. In this work, we show that mean-field variational inference entirely fails to model the data when the network width is large and the activation function is odd. Specifically, for fully-connected BNNs with odd activation functions and a homoscedastic Gaussian likelihood, we show that the optimal mean-field variational posterior predictive (i.e., function space) distribution converges to the prior predictive distribution as the width tends to infinity. We generalize aspects of this result to other likelihoods. Our theoretical results are suggestive of underfitting behavior previously observered in BNNs. While our convergence bounds are non-asymptotic and constants in our analysis can be computed, they are currently too loose to be applicable in standard training regimes. Finally, we show that the optimal approximate posterior need not tend to the prior if the activation function is not odd, showing that our statements cannot be generalized arbitrarily.