论文标题
贝叶斯密度估计的划分和征服算法
A Divide and Conquer Algorithm of Bayesian Density Estimation
论文作者
论文摘要
即使将一台机器存储在一台机器上,用于统计分析的数据集也变得非常大。即使数据可以存储在一台计算机中,计算成本仍然令人生畏。我们提出了使用贝叶斯混合物模型(包括无限混合物盒)的划分和征服解决方案对密度估计的。该方法可以推广到采用贝叶斯混合模型的其他应用问题。每台计算机上提出的先验或子样本在混合概率以及混合分布中的其余参数上修改了原始先验。最终估计量是通过获取与每个子集上提议的先验相对应的后样品的平均值获得的。尽管由于数据分裂而大大减少了时间,但所提出的估计器的后验收缩率与总体分析数据时的原始先验相同(最多为日志系数)。模拟研究还证明了所提出的方法的能力与有限尺寸案例中已建立的黄蜂估计量相比。此外,我们的一个模拟是在形状约束的反卷积上下文中进行的,并揭示了有希望的结果。 GWAS数据集的应用程序揭示了使用原始先验的幼稚方法的优势。
Data sets for statistical analysis become extremely large even with some difficulty of being stored on one single machine. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bayesian mixture modeling including the infinite mixture case. The methodology can be generalized to other application problems where a Bayesian mixture model is adopted. The proposed prior on each machine or subsample modifies the original prior on both mixing probabilities as well as on the rest of parameters in the distributions being mixed. The ultimate estimator is obtained by taking the average of the posterior samples corresponding to the proposed prior on each subset. Despite the tremendous reduction in time thanks to data splitting, the posterior contraction rate of the proposed estimator stays the same (up to a log factor) as that of the original prior when the data is analyzed as a whole. Simulation studies also justify the competency of the proposed method compared to the established WASP estimator in the finite dimension case. In addition, one of our simulations is performed in a shape constrained deconvolution context and reveals promising results. The application to a GWAS data set reveals the advantage over a naive method that uses the original prior.