两层神经网络的广义神经切线内核分析

论文标题

两层神经网络的广义神经切线内核分析

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

论文作者

Chen, Zixiang, Cao, Yuan, Gu, Quanquan, Zhang, Tong

论文摘要

深度学习理论的最新突破表明，过度参数深神经网络的训练可以以一种称为\ textIt {神经切线核}（NTK）的内核函数来表征。但是，众所周知，这种类型的结果与实践不完全匹配，因为基于NTK的分析要求网络权重在整个培训过程中保持非常接近其初始化，并且无法处理正规化器或梯度噪声。在本文中，我们提供了广义的神经切线核分析，并表明带重衰减的嘈杂梯度下降仍然可以表现出“内核样”行为。这意味着训练损失线性收敛至一定精度。我们还为通过重量衰减的嘈杂梯度下降训练的两层神经网络建立了一个新的概括误差。

A recent breakthrough in deep learning theory shows that the training of over-parameterized deep neural networks can be characterized by a kernel function called \textit{neural tangent kernel} (NTK). However, it is known that this type of results does not perfectly match the practice, as NTK-based analysis requires the network weights to stay very close to their initialization throughout training, and cannot handle regularizers or gradient noises. In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a "kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.

下载PDF全文

下载文献需遵守相关版权规定

论文标题