论文标题
近零组的多项式和学习潜在变量模型的小封面
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models
论文作者
论文摘要
令$ v $是多元学度的矢量空间-U $ D $均质的多项式,最多$ k $,而$ s $是一组点,其中所有多项式中的所有多项式中的所有多项式都消失了。我们在$ \ ell_2 $ -norm中建立了一个定性最佳的上限,$ s $ $ s $的$ε$ -Covers。粗略地说,我们证明存在$ε$ -cover,$ s $ s $ of dasinality $ m =(k/ε)^{o_d(k^{1/d})} $。我们的结果是建设性产生一种算法来计算这种$ε$ cover,该$ε$ cover在时间上运行$ \ mathrm {poly}(m)$。 在我们的结构结果的基础上,我们为具有隐藏变量的几种基本高维概率模型获得了显着改善的学习算法。这些包括$ k $ - 元素的密度和参数估计,具有$ k $隐藏单元(在高斯分布下),$ k $ $ k $ -k $ -mixtures线性回归(与Gaussian corvariates)的$ k $ hidden单元(在分布下)和参数估算$ k $ k $ kmetliest $ kmettliest,我们的算法在参数$ k $中时{\ em quasi-polynomial}时运行。这些问题的先前算法在$ k^{ω(1)} $中的运行时间指数级。 在所有这些学习问题的高级算法上,我们的算法如下:通过计算隐藏参数的低度矩,我们能够找到几乎在未知参数上消失的多项式的矢量空间。我们的结构结果使我们能够为一组隐藏参数计算一个准多项式大小的封面,这是我们在学习算法中利用的。
Let $V$ be any vector space of multivariate degree-$d$ homogeneous polynomials with co-dimension at most $k$, and $S$ be the set of points where all polynomials in $V$ {\em nearly} vanish. We establish a qualitatively optimal upper bound on the size of $ε$-covers for $S$, in the $\ell_2$-norm. Roughly speaking, we show that there exists an $ε$-cover for $S$ of cardinality $M = (k/ε)^{O_d(k^{1/d})}$. Our result is constructive yielding an algorithm to compute such an $ε$-cover that runs in time $\mathrm{poly}(M)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for $k$-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with $k$ hidden units (under the Gaussian distribution), density and parameter estimation for $k$-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for $k$-mixtures of hyperplanes. Our algorithms run in time {\em quasi-polynomial} in the parameter $k$. Previous algorithms for these problems had running times exponential in $k^{Ω(1)}$. At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.