了解过度参数如何导致加速度：学习单个老师神经元的案例

论文标题

了解过度参数如何导致加速度：学习单个老师神经元的案例

Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

论文作者

Wang, Jun-Kun, Abernethy, Jacob

论文摘要

过度参数已成为深度学习中的一种流行技术。可以观察到，通过过度参数化，较大的神经网络需要比较小的训练迭代率更少，以达到一定水平的性能，即过度参数会导致优化加速。但是，尽管如今广泛使用了过度参数，但由于过度参数化而无法解释加速度的理论。在本文中，我们首先研究一个简单的问题来理解它。具体而言，我们考虑这样的设置，即有一个带有二次激活的教师神经元，通过让多个学生神经元学习从教师神经元中产生的数据来实现过度参数化。我们证明，过度参数有助于梯度下降生成的迭代，以进入更快地实现零测试误差的全局最佳解决方案的附近。另一方面，我们还指出了一个有关过度参数化的必要性的问题，并研究了输出神经元的缩放时间如何影响收敛时间。

Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance -- namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster. On the other hand, we also point out an issue regarding the necessity of over-parametrization and study how the scaling of the output neurons affects the convergence time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题