论文标题

具有隐藏变量的多元响应回归中的自适应估计

Adaptive Estimation in Multivariate Response Regression with Hidden Variables

论文作者

Bing, Xin, Ning, Yang, Xu, Yaosheng

论文摘要

本文研究了具有隐藏变量的多变量回归中系数矩阵$ \ ttheta $的估计,$ y =(\ ttheta)^tx +(b^*)^tz + e $,其中$ y $是$ m $ - $ dimensional-demensional-demensional-demensional-demensional-dementional-d $ x $的$ p $ -dimensiral of-k $ z $ quentim a $ k $ quentim a $ - 未观察到的隐藏变量,可能与$ x $相关,而$ e $是独立错误。隐藏变量的数量$ k $是未知的,$ m $和$ p $均被允许,但不需要随样本量$ n $而生长。由于只能观察到$ y $和$ x $,因此我们为$ \ ttheta $的可识别性提供必要的条件。当错误$ e $均匀时,相同的条件被证明足够了。我们的可识别性证明具有建设性,并导致一种新颖和计算上有效的估计算法,称为Hive。该算法的第一步是估算给定$ x $的$ y $的最佳线性预测,其中未知系数矩阵的添加词分解为$ \ ttheta $,而密集的矩阵源自$ x $之间的相关性和隐藏变量$ z $。在$ \ ttheta $上的行稀疏性假设下,我们建议通过通过集体 - 拉索罚款正规化$ \ ttheta $来最大程度地减少惩罚的最小二乘损失,并通过多元山脊惩罚正规化密集的矩阵。建立了样本内预测误差的非扰动偏差范围。我们的第二步是通过从第一步开始利用残差向量的协方差结构来估算$ b^*$的行空间。在最后一步中,我们通过将$ y $投影到$ b^*$的估计行空间的补充中来删除隐藏变量的效果。确定了我们最终估计量的非反应误差范围。模型可识别性,参数估计和统计保证将进一步扩展到异质误差的设置。

This paper studies the estimation of the coefficient matrix $\Ttheta$ in multivariate regression with hidden variables, $Y = (\Ttheta)^TX + (B^*)^TZ + E$, where $Y$ is a $m$-dimensional response vector, $X$ is a $p$-dimensional vector of observable features, $Z$ represents a $K$-dimensional vector of unobserved hidden variables, possibly correlated with $X$, and $E$ is an independent error. The number of hidden variables $K$ is unknown and both $m$ and $p$ are allowed but not required to grow with the sample size $n$. Since only $Y$ and $X$ are observable, we provide necessary conditions for the identifiability of $\Ttheta$. The same set of conditions are shown to be sufficient when the error $E$ is homoscedastic. Our identifiability proof is constructive and leads to a novel and computationally efficient estimation algorithm, called HIVE. The first step of the algorithm is to estimate the best linear prediction of $Y$ given $X$ in which the unknown coefficient matrix exhibits an additive decomposition of $\Ttheta$ and a dense matrix originated from the correlation between $X$ and the hidden variable $Z$. Under the row sparsity assumption on $\Ttheta$, we propose to minimize a penalized least squares loss by regularizing $\Ttheta$ via a group-lasso penalty and regularizing the dense matrix via a multivariate ridge penalty. Non-asymptotic deviation bounds of the in-sample prediction error are established. Our second step is to estimate the row space of $B^*$ by leveraging the covariance structure of the residual vector from the first step. In the last step, we remove the effect of hidden variable by projecting $Y$ onto the complement of the estimated row space of $B^*$. Non-asymptotic error bounds of our final estimator are established. The model identifiability, parameter estimation and statistical guarantees are further extended to the setting with heteroscedastic errors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源