论文标题
山脊回归重新审视:偏见,阈值和引导程序
Ridge Regression Revisited: Debiasing, Thresholding and Bootstrap
论文作者
论文摘要
在高维数据时代,拉索的成功归因于其进行隐式模型选择,即,将回归系数归零并不重要的回归系数。相比之下,经典的脊回归无法揭示参数的潜在稀疏性,并且还可能在高维度下引入较大的偏见。然而,最近在套索的工作涉及依据和阈值,后者是为了进一步增强模型选择。结果,山脊回归可能值得另一种外观,因为在进行辩解和阈值之后 - 它可能具有与套索相比的一些优势,例如,它可以轻松地使用封闭式表达式计算。 %且具有与阈值套索相似的性能。在本文中,我们定义了一种偏见和阈值的脊回归方法,并证明了一致性结果和高斯近似定理。我们进一步引入了一种野生引导算法来构建置信区域并对参数的线性组合进行假设检验。除估计外,我们还考虑了预测问题,并提出了一种针对预测间隔的新型混合引导算法。广泛的数值模拟进一步表明,在某些情况下,脱氧和阈值的脊回归具有有限的样本性能,并且在某些情况下可能是可取的。
The success of the Lasso in the era of high-dimensional data can be attributed to its conducting an implicit model selection, i.e., zeroing out regression coefficients that are not significant. By contrast, classical ridge regression can not reveal a potential sparsity of parameters, and may also introduce a large bias under the high-dimensional setting. Nevertheless, recent work on the Lasso involves debiasing and thresholding, the latter in order to further enhance the model selection. As a consequence, ridge regression may be worth another look since -- after debiasing and thresholding -- it may offer some advantages over the Lasso, e.g., it can be easily computed using a closed-form expression. % and it has similar performance to threshold Lasso. In this paper, we define a debiased and thresholded ridge regression method, and prove a consistency result and a Gaussian approximation theorem. We further introduce a wild bootstrap algorithm to construct confidence regions and perform hypothesis testing for a linear combination of parameters. In addition to estimation, we consider the problem of prediction, and present a novel, hybrid bootstrap algorithm tailored for prediction intervals. Extensive numerical simulations further show that the debiased and thresholded ridge regression has favorable finite sample performance and may be preferable in some settings.