贝叶斯深层通过神经切线内核

论文标题

贝叶斯深层通过神经切线内核

Bayesian Deep Ensembles via the Neural Tangent Kernel

论文作者

He, Bobby, Lakshminarayanan, Balaji, Teh, Yee Whye

论文摘要

我们通过神经切线内核（NTK）的镜头探索了深层合奏与高斯过程（GP）之间的联系：了解广泛神经网络（NNS）的训练动力学的最新发展。先前的工作表明，即使在无限的宽度极限，当NN成为GP时，GP后部也没有对经过平方误差损失的深层合奏的解释。我们通过向每个集合成员添加了可计算，随机和无法实现的功能，对标准深层合奏训练进行了简单的修改，从而实现了无限宽度限制的后验解释。当结合在一起时，我们训练有素的NN给出了后验预测分布的近似值，我们证明我们的贝叶斯深度集合比在无限宽度极限中比标准的深层集合更加保守。最后，使用有限的宽度NNS，我们证明了我们的贝叶斯深度合奏在可用时忠实地模仿了分析后验预测，并且可以在各种分发设置中胜过标准的深层集合，以进行回归和分类任务。

We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题