论文标题
轻度过度参数化对浅层神经网络优化景观的影响
The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks
论文作者
论文摘要
我们研究了轻度过度参数化对形式的简单relu神经网络的优化景观的影响目标值是由相同的架构产生的,并且在高斯输入方面直接优化了种群平方损失时。我们证明,尽管当教师和学生网络拥有相同数量的神经元时,目的是在全球最小值周围强烈凸起的,但在任何数量过度参数化之后,它甚至都不是\ emph {局部凸出}。此外,相关的理想特性(例如,一分强凸度和polyak-olojasiewicz条件)甚至在本地也没有。另一方面,我们确定目标在\ emph {mosp}方向(适当定义)中仍然强烈凸出,并在此属性下显示优化保证。对于非全球最小值,我们证明,即使只添加一个神经元也将非全球最小值转换为鞍点。这在我们经验验证的一些技术条件下存在。这些结果提供了一个可能的解释,说明为什么当我们过度参数化时,即使过度参数化的量非常适度,为什么恢复全球最小值会变得非常容易。
We study the effects of mild over-parameterization on the optimization landscape of a simple ReLU neural network of the form $\mathbf{x}\mapsto\sum_{i=1}^k\max\{0,\mathbf{w}_i^{\top}\mathbf{x}\}$, in a well-studied teacher-student setting where the target values are generated by the same architecture, and when directly optimizing over the population squared loss with respect to Gaussian inputs. We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization. Moreover, related desirable properties (e.g., one-point strong convexity and the Polyak-Łojasiewicz condition) also do not hold even locally. On the other hand, we establish that the objective remains one-point strongly convex in \emph{most} directions (suitably defined), and show an optimization guarantee under this property. For the non-global minima, we prove that adding even just a single neuron will turn a non-global minimum into a saddle point. This holds under some technical conditions which we validate empirically. These results provide a possible explanation for why recovering a global minimum becomes significantly easier when we over-parameterize, even if the amount of over-parameterization is very moderate.