论文标题
关于稀疏高斯过程和变异推断的教程
A Tutorial on Sparse Gaussian Processes and Variational Inference
论文作者
论文摘要
高斯流程(GPS)为贝叶斯推论提供了一个框架,可以为各种问题提供原则上的不确定性估计。例如,如果我们考虑高斯可能性的回归问题,则GP模型以封闭形式的后部呈现。但是,用训练示例的数量来识别后GP尺度,并需要将所有示例存储在内存中。为了克服这些障碍,已经提出了稀疏的GP,该GP与伪训练的例子近似于真正的后GP。重要的是,伪训练示例的数量是用户定义的,可以控制计算和内存复杂性。在一般情况下,稀疏的GP不享受封闭式解决方案,并且必须求助于近似推断。在这种情况下,近似推断的便捷选择是变异推理(VI),其中贝叶斯推理的问题被视为优化问题 - 即,最大程度地提高了对数边缘可能性的下限。这为强大而多功能的框架铺平了道路,其中伪训练的例子被视为与生成模型的超参数(即先前和可能性)共同识别的近似后部的优化参数。该框架自然可以处理各种监督学习问题的范围,范围从具有异性和非高斯可能性的回归到具有离散标签的分类问题,但也具有多维标签的问题。本教程的目的是为读者提供基本问题的访问,而在GPS和VI中都没有知识。对该主题的适当说明还可以访问最新进步(例如,重要性加权VI以及域间,多输出和深度GPS),可以作为新研究思想的灵感。
Gaussian processes (GPs) provide a framework for Bayesian inference that can offer principled uncertainty estimates for a large range of problems. For example, if we consider regression problems with Gaussian likelihoods, a GP model enjoys a posterior in closed form. However, identifying the posterior GP scales cubically with the number of training examples and requires to store all examples in memory. In order to overcome these obstacles, sparse GPs have been proposed that approximate the true posterior GP with pseudo-training examples. Importantly, the number of pseudo-training examples is user-defined and enables control over computational and memory complexity. In the general case, sparse GPs do not enjoy closed-form solutions and one has to resort to approximate inference. In this context, a convenient choice for approximate inference is variational inference (VI), where the problem of Bayesian inference is cast as an optimization problem -- namely, to maximize a lower bound of the log marginal likelihood. This paves the way for a powerful and versatile framework, where pseudo-training examples are treated as optimization arguments of the approximate posterior that are jointly identified together with hyperparameters of the generative model (i.e. prior and likelihood). The framework can naturally handle a wide scope of supervised learning problems, ranging from regression with heteroscedastic and non-Gaussian likelihoods to classification problems with discrete labels, but also problems with multidimensional labels. The purpose of this tutorial is to provide access to the basic matter for readers without prior knowledge in both GPs and VI. A proper exposition to the subject enables also access to more recent advances (like importance-weighted VI as well as interdomain, multioutput and deep GPs) that can serve as an inspiration for new research ideas.