培训深度神经网络而无需批准

论文标题

培训深度神经网络而无需批准

Training Deep Neural Networks Without Batch Normalization

论文作者

Gaur, Divya, Folz, Joachim, Dengel, Andreas

论文摘要

训练神经网络是一个优化问题，通过梯度下降找到一组体面的参数可能是一项艰巨的任务。已经开发了许多技术来在训练阶段和期间为此过程提供帮助。标准化是最重要和最广泛使用的方法之一。通常，神经元接收均值零和单位方差的输入，因此我们使用有关数据集的统计信息将其归一化。但是，对于网络中的中间激活，无法保证此属性。一种广泛使用的方法来在网络中执行此属性，这是批处理归一化。它的开发是为了对抗网络内部的协变量变化。从经验上讲，它可以正常工作，但是缺乏理论上对其有效性和在实践中使用时可能存在的潜在缺陷的理解。这项工作将详细研究批准，同时将其与其他方法（例如重量归一化，梯度剪切和辍学）进行比较。这项工作的主要目的是确定当通过适应培训过程来删除批处理时，是否有效训练网络。

Training neural networks is an optimization problem, and finding a decent set of parameters through gradient descent can be a difficult task. A host of techniques has been developed to aid this process before and during the training phase. One of the most important and widely used class of method is normalization. It is generally favorable for neurons to receive inputs that are distributed with zero mean and unit variance, so we use statistics about dataset to normalize them before the first layer. However, this property cannot be guaranteed for the intermediate activations inside the network. A widely used method to enforce this property inside the network is batch normalization. It was developed to combat covariate shift inside networks. Empirically it is known to work, but there is a lack of theoretical understanding about its effectiveness and potential drawbacks it might have when used in practice. This work studies batch normalization in detail, while comparing it with other methods such as weight normalization, gradient clipping and dropout. The main purpose of this work is to determine if it is possible to train networks effectively when batch normalization is removed through adaption of the training process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题