论文标题

稀疏高维线性回归模型中模型选择的强大信息标准

Robust Information Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models

论文作者

Gohain, Prakash B., Jansson, Magnus

论文摘要

在处理高维数据时,线性回归模型中的模型选择是一个主要挑战,其中可用测量的数量(样本量)比参数空间的维度小得多。模型选择的传统方法,例如Akaike信息标准,贝叶斯信息标准(BIC)和最小描述长度,在高维环境中很容易过度拟合。在这方面,扩展的BIC(EBIC)是原始BIC和扩展的Fisher Information Criterion(EFIC)的扩展版本,它是EBIC和Fisher Information Criterion的组合,是真实模型的一致估计器,因为测量数量的数量变得非常大。但是,在固定样本量固定的高信噪比(SNR)方案中,EBIC并不一致,并且EFIC并不与数据扩展不变,从而导致行为不稳定。在本文中,我们提出了一种称为Ebic-Robust的EBIC标准的新形式,该标准在数据缩放方面不变,并且在大型样本量和高SNR方案中都保持一致。提供分析证明以确保其一致性。仿真结果表明,Ebic-Robust的性能非常优于EBIC和EFIC。

Model selection in linear regression models is a major challenge when dealing with high-dimensional data where the number of available measurements (sample size) is much smaller than the dimension of the parameter space. Traditional methods for model selection such as Akaike information criterion, Bayesian information criterion (BIC) and minimum description length are heavily prone to overfitting in the high-dimensional setting. In this regard, extended BIC (EBIC), which is an extended version of the original BIC and extended Fisher information criterion (EFIC), which is a combination of EBIC and Fisher information criterion, are consistent estimators of the true model as the number of measurements grows very large. However, EBIC is not consistent in high signal-to-noise-ratio (SNR) scenarios where the sample size is fixed and EFIC is not invariant to data scaling resulting in unstable behaviour. In this paper, we propose a new form of the EBIC criterion called EBIC-Robust, which is invariant to data scaling and consistent in both large sample size and high-SNR scenarios. Analytical proofs are presented to guarantee its consistency. Simulation results indicate that the performance of EBIC-Robust is quite superior to that of both EBIC and EFIC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源