神经网络的深度调节治疗

论文标题

神经网络的深度调节治疗

A Deep Conditioning Treatment of Neural Networks

论文作者

Agarwal, Naman, Awasthi, Pranjal, Kale, Satyen

论文摘要

我们研究深度在随机初始化过度参数化神经网络中的作用。我们给出一个总体结果，表明深度通过改善输入数据的某些内核矩阵的条件来提高神经网络的训练性。该结果适用于某个归一化的任意非线性激活函数。我们提供了仅通过神经切线内核来训练神经网络的顶层以及训练所有层的结果。作为这些一般结果的应用，我们提供了Das等人的结果的概括。（2019年）表明，具有大量非线性激活的深度随机神经网络的可学习性随着深度的指数降低。我们还展示了如何通过Bartlett等人的结果在深层神经网络中发生良性过度拟合。（2019b）。我们还提供了实验证据，表明Relu的归一化版本是训练深神经网络中批处理归一化的更复杂操作的可行替代方案。

We study the role of depth in training randomly initialized overparameterized neural networks. We give a general result showing that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data. This result holds for arbitrary non-linear activation functions under a certain normalization. We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers, via the neural tangent kernel. As applications of these general results, we provide a generalization of the results of Das et al. (2019) showing that learnability of deep random neural networks with a large class of non-linear activations degrades exponentially with depth. We also show how benign overfitting can occur in deep neural networks via the results of Bartlett et al. (2019b). We also give experimental evidence that normalized versions of ReLU are a viable alternative to more complex operations like Batch Normalization in training deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题