论文标题

在正规回归的核心上

On Coresets For Regularized Regression

论文作者

Chhaya, Rachit, Dasgupta, Anirban, Shit, Supratim

论文摘要

我们研究基于规范的正则化对回归问题核心尺寸的影响。具体而言,给定一个矩阵$ \ mathbf {a} \ in {\ mathbb {r}}} ^ {n \ times d} $,带有$ n \ gg d $和vector $ \ mathbf {b} \ in \ mathbb {b}美元先前的工作表明,对于脊回归(其中$ p,q,r,s = 2 $),我们可以获得一个小于未注册的对应物的核心,即最小二乘回归(Avron等人)。我们表明,当$ r \ neq s $时,没有用于正则回归的核心的尺寸可能小于未注册版本的最佳核心。众所周知的套索问题属于这一类别,因此不允许小于最小二乘回归的核心。我们提出了一个修改的套索问题,并为其获得一个小于最小平方回归的尺寸的核心。我们从经验上表明,修改后的套索也会引起溶液中的稀疏性,类似于原始套索。我们还获得了$ \ ell_p $回归$ \ ell_p $正则化的较小核心。我们将方法扩展到多响应正则回归。最后,我们从经验上证明了修改后的拉索的核心性能,以及$ \ ell_1 $正则化的$ \ ell_1 $回归。

We study the effect of norm based regularization on the size of coresets for regression problems. Specifically, given a matrix $ \mathbf{A} \in {\mathbb{R}}^{n \times d}$ with $n\gg d$ and a vector $\mathbf{b} \in \mathbb{R} ^ n $ and $λ> 0$, we analyze the size of coresets for regularized versions of regression of the form $\|\mathbf{Ax}-\mathbf{b}\|_p^r + λ\|{\mathbf{x}}\|_q^s$ . Prior work has shown that for ridge regression (where $p,q,r,s=2$) we can obtain a coreset that is smaller than the coreset for the unregularized counterpart i.e. least squares regression (Avron et al). We show that when $r \neq s$, no coreset for regularized regression can have size smaller than the optimal coreset of the unregularized version. The well known lasso problem falls under this category and hence does not allow a coreset smaller than the one for least squares regression. We propose a modified version of the lasso problem and obtain for it a coreset of size smaller than the least square regression. We empirically show that the modified version of lasso also induces sparsity in solution, similar to the original lasso. We also obtain smaller coresets for $\ell_p$ regression with $\ell_p$ regularization. We extend our methods to multi response regularized regression. Finally, we empirically demonstrate the coreset performance for the modified lasso and the $\ell_1$ regression with $\ell_1$ regularization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源