频谱最小二乘类型方法，用于重尾损坏回归，其协方差\＆异质噪声

论文标题

频谱最小二乘类型方法，用于重尾损坏回归，其协方差\＆异质噪声

A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise

论文作者

Oliveira, Roberto I., Rico, Zoraida F., Thompson, Philip

论文摘要

我们重新访问了损坏的最小二乘线性回归，假设最多损坏了$ n $ n $ n $ n $ n $ n $ n $ n $ h $εn$ thuny $εn$ tunionary Outliers的样本。我们希望估计给定标签 - 功能对$（y，x）$满足$ y = \ y = \ langle x，b^*\ rangle+ξ$带有重尾$（x，ξ）$的标签 - 功能对$（y，x）$的样本的样本。我们只假设$ x $ as $ l^4-l^2 $超债券，$ l> 0 $，并且具有最低特征值$ 1/μ^2> 0 $的协方差矩阵$σ$，并且有界条件编号$κ> 0 $。噪声$ξ$可以任意取决于$ x $，只要$ξx$具有有限的协方差矩阵$ξ$，就可以任意取决于$ x $。我们根据功率方法提出了一个近乎最佳的计算估计器，假设对$（σ，ξ）$也不了解$ξ$的运算符规范。我们提出的估计器以至少$ 1-δ$的概率达到统计率$μ^2 \vertξ\ vert^{1/2}（\ frac {p} {p} {n} {n}+\ frac {\ frac {\ log（1/δ）} $ \ sim \ frac {1} {l^4κ^2} $，在$ \ ell_2 $ -norm中都具有最佳状态，假设近乎最小的最小样本大小$ l^4κ2（p \ log p + p + log p + log log（1/δ））据我们所知，这是同时满足所有提到的所有属性的第一个计算障碍算法。我们的估计器基于两阶段的乘量重量更新算法。第一阶段估计了（未知）预先条件的内部产品$ \langleς（\ cdot），\ cdot \ rangle $。第二阶段估计下降方向$σ\ hat v $相对于（已知的）内部产品$ \ langle \ cdot，\ cdot \ rangle $，而无需不了解或估计$σ$。

We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $εn$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+ξ$ with heavy-tailed $(x,ξ)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $Σ$ with minimum eigenvalue $1/μ^2>0$ and bounded condition number $κ>0$. The noise $ξ$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $ξx$ has finite covariance matrix $Ξ$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Σ,Ξ)$ nor the operator norm of $Ξ$. With probability at least $1-δ$, our proposed estimator attains the statistical rate $μ^2\VertΞ\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/δ)}{n}+ε)^{1/2}$ and breakdown-point $ε\lesssim\frac{1}{L^4κ^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4κ^2(p\log p + \log(1/δ))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langleΣ(\cdot),\cdot\rangle$. The second stage estimate the descent direction $Σ\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $Σ$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题