论文标题

基于模型的筛选嵌入式贝叶斯变量选择,用于超高维度

Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings

论文作者

Li, Dongjin, Dutta, Somak, Roy, Vivekananda

论文摘要

我们基于分层高斯线性模型,开发了一种称为SVEN的贝叶斯变量选择方法,该模型在回归系数和模型空间上放置了PRIOR。通过在非活性变量上使用退化尖峰先验来实现稀疏性,而高斯平板先验则将其放置在重要​​预测因子的系数上,从而使模型的后验概率(以显式形式可用(最高归一化常数)。当预测因子的数量几乎随样本量呈指数增长,即使平均效应的规范仅仅是由于不重要的变量差异,这是一个新颖的吸引力,因此,强大的模型选择一致性被证明可以实现。 SVEN的一个吸引人的副产品是新型模型权重调整预测间隔的构建。 Sven嵌入了基于唯一的模型筛选并使用快速的Cholesky更新,生产了一个高度可扩展的计算框架,以探索巨大的模型空间,迅速识别高后验概率的区域并进行快速的推断和预测。由我们的模型选择一致性推导引导的温度时间表用于进一步减轻多模式后分布。 SVEN的性能通过许多模拟实验和基因组广泛关联研究的真实数据示例证明了超过半百万个标记。

We develop a Bayesian variable selection method, called SVEN, based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space. Sparsity is achieved by using degenerate spike priors on inactive variables, whereas Gaussian slab priors are placed on the coefficients for the important predictors making the posterior probability of a model available in explicit form (up to a normalizing constant). The strong model selection consistency is shown to be attained when the number of predictors grows nearly exponentially with the sample size and even when the norm of mean effects solely due to the unimportant variables diverge, which is a novel attractive feature. An appealing byproduct of SVEN is the construction of novel model weight adjusted prediction intervals. Embedding a unique model based screening and using fast Cholesky updates, SVEN produces a highly scalable computational framework to explore gigantic model spaces, rapidly identify the regions of high posterior probabilities and make fast inference and prediction. A temperature schedule guided by our model selection consistency derivations is used to further mitigate multimodal posterior distributions. The performance of SVEN is demonstrated through a number of simulation experiments and a real data example from a genome wide association study with over half a million markers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源