自适应Cholesky高斯流程

论文标题

自适应Cholesky高斯流程

Adaptive Cholesky Gaussian Processes

论文作者

Bartels, Simon, Stensbo-Smidt, Kristoffer, Moreno-Muñoz, Pablo, Boomsma, Wouter, Frellsen, Jes, Hauberg, Søren

论文摘要

我们提出了一种通过仅考虑数据的子集来近似大数据集的高斯过程回归模型的方法。我们的方法是新颖的，因为在精确推断的情况下，该子集的大小是在几乎没有计算开销的过程中即时选择的。从经验观察到，一旦观察到足够的数据集的足够子集，对数 - 边界可能性的可能性通常会出现线性趋势，我们得出的结论是，许多大型数据集都包含冗余信息，这些信息仅略微影响后部。基于此，我们在可以识别此类子集的完整模型证据上提供概率界限。值得注意的是，这些界限主要由标准Cholesky分解的中间步骤中出现的术语组成，从而使我们能够修改算法，一旦观察到足够的数据，就可以自适应地停止分解。

We present a method to approximate Gaussian process regression models for large datasets by considering only a subset of the data. Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead. From an empirical observation that the log-marginal likelihood often exhibits a linear trend once a sufficient subset of a dataset has been observed, we conclude that many large datasets contain redundant information that only slightly affects the posterior. Based on this, we provide probabilistic bounds on the full model evidence that can identify such subsets. Remarkably, these bounds are largely composed of terms that appear in intermediate steps of the standard Cholesky decomposition, allowing us to modify the algorithm to adaptively stop the decomposition once enough data have been observed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题