分布式贝叶斯变化的系数建模使用高斯工艺事先

论文标题

分布式贝叶斯变化的系数建模使用高斯工艺事先

Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

论文作者

Guhaniyogi, Rajarshi, Li, Cheng, Savitsky, Terrance D., Srivastava, Sanvesh

论文摘要

各种系数模型（VCM）广泛用于估计功能数据的非线性回归函数。但是，他们在功能系数上使用高斯工艺先验的贝叶斯变体在大规模数据应用中受到了有限的关注，这主要是由于使用马尔可夫链Monte Carlo（MCMC）算法的过度慢速后验计算。我们使用划分和贝叶斯的方法来解决这个问题。我们首先创建了大量尺寸要小得多的数据子样本。然后，我们将VCM作为线性混合效应模型制定，并开发出一种数据增强算法，以便获得MCMC并行借鉴所有子集。最后，我们将基于MCMC的子集后期估计值汇总为单个聚合的Monte Carlo（AMC）后部，该后期被用作真实后验分布的计算有效替代方案。从理论上讲，我们得出了不同系数和平均回归函数的AMC后期的最小最佳后收敛速率。我们提供了子集样本量和子集数量的订单的量化。经验结果表明，满足我们理论假设的组合方案（包括AMC后部）的估计性能比在各种模拟和实际数据分析中的主要竞争对手更好。

Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, we aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题