论文标题
使用梯度下降学习一层神经网络的超多个单位下限
Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent
论文作者
论文摘要
我们证明了使用梯度下降相对于高斯分布来学习一层神经网络的第一个超多个方面的下限。我们表明,任何使用梯度下降的分类器相对于正方形损失训练,都将无法在多项式时间内实现小测试误差,允许访问由一层神经网络标记的样品。对于分类,我们给出更强的结果,即任何统计查询(SQ)算法(包括梯度下降)都将无法在多项式时间内实现小测试误差。先前的工作仅适用于梯度下降量很小,需要尖锐的激活,并应用于特定的查询类别。我们的下界适用于包括Relu和Sigmoid在内的广泛激活类别。我们结果的核心依赖于一个简单的神经网络家族的新颖结构,这些神经网络与所有球形对称分布完全正交。
We prove the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution using gradient descent. We show that any classifier trained using gradient descent with respect to square-loss will fail to achieve small test error in polynomial time given access to samples labeled by a one-layer neural network. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm (including gradient descent) will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes, required sharp activations, and applied to specific classes of queries. Our lower bounds hold for broad classes of activations including ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.