论文标题
通过协变量过滤的稳健回归:重尾巴和对抗性污染
Robust regression with covariate filtering: Heavy tails and adversarial contamination
论文作者
论文摘要
我们研究了线性回归的问题,即协变量和反应可能是(i)重尾和(ii)对抗污染的。已经提出了几个计算有效的估计器,以使协变量是次级高斯且未经污染的更简单的设置。但是,当协变量是重尾或包含异常值时,这些估计器可能会失败。在这项工作中,我们展示了如何修改Huber回归,最小修剪的正方形和最小绝对偏差估计器,以获得在更强的污染模型中同时在计算和统计上有效的估计器。我们的方法非常简单,包括将过滤算法应用于协变量,然后将经典鲁棒回归估计器应用于其余数据。我们表明,在这种情况下,Huber回归估计器达到了近乎最佳的错误率,而修剪的正方形最小,绝对偏差估计器最少可以在应用后处理后达到近乎最佳的错误。
We study the problem of linear regression where both covariates and responses are potentially (i) heavy-tailed and (ii) adversarially contaminated. Several computationally efficient estimators have been proposed for the simpler setting where the covariates are sub-Gaussian and uncontaminated; however, these estimators may fail when the covariates are either heavy-tailed or contain outliers. In this work, we show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators which are simultaneously computationally and statistically efficient in the stronger contamination model. Our approach is quite simple, and consists of applying a filtering algorithm to the covariates, and then applying the classical robust regression estimators to the remaining data. We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.