论文标题
拉索的跨验证误差与套索的上限
Rademacher upper bounds for cross-validation errors with an application to the lasso
论文作者
论文摘要
我们为$ k $折叠的跨验证($ k $ -cv)错误建立了一个通用的上限,可以适应许多基于$ k $ -cv的估计器和学习算法。基于模型的Rademacher复杂性以及错误过程的Orlicz- $ψ_ν$规范,CV错误上限适用于轻尾和重尾误差分布。我们还使用独立阻止技术将CV误差上限扩展到$β$混合数据。我们提供一个Python软件包(\ texttt {cvbound},\ url {https://github.com/isaac2math})来计算基于$ k $ -cv的算法中的CV错误上限。以拉索为例,我们在模拟中证明了上限在不同的参数设置和随机种子上紧密而稳定。除了准确地界定套索的简历误差外,新上限的最小化器还可以用作可变选择的标准。与CV-ERROR最小化器相比,模拟表明,根据上限的最小化调整LASSO惩罚参数会产生一个更稀疏,更稳定的模型,该模型保留了所有相关变量。
We establish a general upper bound for $K$-fold cross-validation ($K$-CV) errors that can be adapted to many $K$-CV-based estimators and learning algorithms. Based on Rademacher complexity of the model and the Orlicz-$Ψ_ν$ norm of the error process, the CV error upper bound applies to both light-tail and heavy-tail error distributions. We also extend the CV error upper bound to $β$-mixing data using the technique of independent blocking. We provide a Python package (\texttt{CVbound}, \url{https://github.com/isaac2math}) for computing the CV error upper bound in $K$-CV-based algorithms. Using the lasso as an example, we demonstrate in simulations that the upper bounds are tight and stable across different parameter settings and random seeds. As well as accurately bounding the CV errors for the lasso, the minimizer of the new upper bounds can be used as a criterion for variable selection. Compared with the CV-error minimizer, simulations show that tuning the lasso penalty parameter according to the minimizer of the upper bound yields a more sparse and more stable model that retains all of the relevant variables.