论文标题

正则化M估计器的单索引模型中可观察的调整

Observable adjustments in single-index models for regularized M-estimators

论文作者

Bellec, Pierre C

论文摘要

我们考虑从具有未知链路功能的单个索引模型,高斯协变量和正规M估计器$ \hatβ$从凸损耗函数和正常化器构建的正规M-估计器$ \hatβ$。在样本量$ n $和dimension $ p $都在增加以使$ p/n $具有有限限制的制度中$ p/n $,损失,正则化和数据生成过程。 $(\hatβ,x \hatβ)$与相应的近端操作员之间的这种连接需要求解固定点方程,这些方程通常涉及无法观察到的数量,例如索引或链接函数上的先验分布。 本文开发了一种不同的理论来描述$ \hatβ$和$ x \hatβ$的经验分布:$(\hatβ,x \hatβ)$的近似值在近端运算符方面仅涉及可观察到的调整。这些提出的可观察到的调整是数据驱动的,例如,不需要对索引或链接函数的先验知识。这些新调整产生了指数的各个组件的置信区间,以及$ \hatβ$与指数的相关性的估计器。因此,以数据驱动的方式捕获损失,正则化和模型之间的相互作用,而无需求解以前工作中研究的固定点方程。结果既适用于强烈凸正则化器和未注册的M估计。为单个索引模型的正方形和逻辑损失提供了模拟,包括逻辑回归和1位压缩感测,具有20 \%损坏的位。

We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hatβ$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hatβ$ and the predicted values $X\hatβ$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hatβ,X\hatβ)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hatβ$ and $X\hatβ$: Approximations of $(\hatβ,X\hatβ)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hatβ$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源