论文标题
空间因子建模:未对准数据的贝叶斯矩阵正常方法
Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data
论文作者
论文摘要
在环境和物理科学中,多元面向空间的数据集很普遍。科学家试图共同对多个变量进行建模,每个变量都由空间位置进行索引,以捕获不同因变量之间每个变量和关联的任何基本空间关联。事实证明,多元潜在空间过程模型在驱动统计推断方面有效,并在任意位置为空间过程提供更好的预测性推断。高维多元空间数据(这是本文的主题)是指空间位置数量和空间依赖性变量的数量非常大的数据集。该领域在单变量空间过程中见证了可扩展模型的实质性发展,但是相比之下,这种用于多元空间过程的方法,尤其是当结果的数量中等较大时,相比之下。在这里,我们将单个过程的可扩展建模策略扩展到多元过程。我们追求贝叶斯推论,这对于对潜在空间过程的完全不确定性量化具有吸引力。我们的方法利用了分布理论的基质正态分布,我们用来构建核心区域层次线性线性线性模型(LMC)和空间因子模型的可扩展版本,这些模型在包括潜在空间过程的高维参数空间上提供了推理。我们说明了使用仿真研究以及对大规模植被指数数据集的分析,我们说明了算法与竞争方法的计算和推论益处。
Multivariate spatially-oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High-dimensional multivariate spatial data, which is the theme of this article, refers to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the Matrix-Normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high-dimensional parameter space including the latent spatial process. We illustrate the computational and inferential benefits of our algorithms over competing methods using simulation studies and an analysis of a massive vegetation index data set.