论文标题
具有高斯工艺先验的概率模型中预测概率的可扩展计算
Scalable computation of predictive probabilities in probit models with Gaussian process priors
论文作者
论文摘要
二元数据的预测模型在各个领域都是基础,现代应用的日益复杂性促使了几种灵活的规范,以建模观察到的预测因子与二元响应之间的关系。一个被广泛的解决方案是通过预测因子索引的高斯过程的概率映射来表达概率参数。但是,与连续设置不同,在具有高斯工艺先验的二进制模型中,缺乏封闭形式的结果。马尔可夫链蒙特卡洛方法和近似策略为此问题提供了共同的解决方案,但是最先进的算法在计算上是棘手的,要么在中度到高的维度上是不准确的。在本文中,我们旨在通过得出依赖多元高斯人的累积分布函数或多变量截短正常的功能的概率高斯过程中的预测概率来弥补这一差距。为了评估这些数量,我们基于基于瓷砖 - 低率的蒙特卡洛方法来开发新颖的可扩展解决方案,用于计算多元高斯概率,以及多元截短的正常质量的平均场变异近似值。还讨论了边际可能性和高斯过程后验分布的封闭形式表达式。如模拟和现实世界的经验研究所示,所提出的方法尺度到了最先进的解决方案不切实际的维度。
Predictive models for binary data are fundamental in various fields, and the growing complexity of modern applications has motivated several flexible specifications for modeling the relationship between the observed predictors and the binary responses. A widely-implemented solution is to express the probability parameter via a probit mapping of a Gaussian process indexed by predictors. However, unlike for continuous settings, there is a lack of closed-form results for predictive distributions in binary models with Gaussian process priors. Markov chain Monte Carlo methods and approximation strategies provide common solutions to this problem, but state-of-the-art algorithms are either computationally intractable or inaccurate in moderate-to-high dimensions. In this article, we aim to cover this gap by deriving closed-form expressions for the predictive probabilities in probit Gaussian processes that rely either on cumulative distribution functions of multivariate Gaussians or on functionals of multivariate truncated normals. To evaluate these quantities we develop novel scalable solutions based on tile-low-rank Monte Carlo methods for computing multivariate Gaussian probabilities, and on mean-field variational approximations of multivariate truncated normals. Closed-form expressions for the marginal likelihood and for the posterior distribution of the Gaussian process are also discussed. As shown in simulated and real-world empirical studies, the proposed methods scale to dimensions where state-of-the-art solutions are impractical.