论文标题
基于无模型的线性回归方法的协变量选择,具有精确的概率
Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities
论文作者
论文摘要
在本文中,我们为线性回归中的协变量选择问题提供了一种全新的方法。仅当与由I.I.D.组成的相同数量的高斯协变量相比,仅在最小平方的意义上,它才包括协变量或一组协变量。 $ n(0,1)$随机变量。高斯P值定义为高斯协变量更好的概率。它是根据beta分布给出的,它是准确的,并且适用于所有数据,使其无模型免费。与高斯P值的协变量选择过程仅需要一个截止值$α$:本文中的默认值为$α= 0.01 $。最终的过程非常简单,非常快,不会过度合适,只需要最小二乘。特别是没有正规化参数,没有数据拆分,不使用模拟,不需要收缩,也不需要后选择推理。本文包括模拟的结果,对标准线性模型下渐近行为的真实数据集的应用以及定理。在这里,逐步的过程的执行效果比我们知道的任何其他过程都要好。可用R包{\ it Gausscov}。
In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data making it model-free free. The covariate selection procedures require only a cut-off value $α$ for the Gaussian P-value: the default value in this paper is $α=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the step-wise procedure performs overwhelmingly better than any other procedure we are aware of. An R-package {\it gausscov} is available.