论文标题
通过重量尺度转移不变正规化,改善神经网络的概括和鲁棒性
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations
论文作者
论文摘要
利用重量衰减来惩罚神经网络中的L2权重规范,这是一种标准的培训实践,可以使网络的复杂性正常。在本文中,我们表明,包括重量衰减在内的一个正规化家族无效地惩罚具有正均匀激活函数的网络的固有权重规范,例如线性,relu和max-pool-pool-pooling功能。由于均匀性,网络指定的功能与层之间的重量尺度的变化不变。无效的正规化器对这种转移敏感,因此使模型容量不正常,导致过度拟合。为了解决这一缺点,我们提出了一个改进的正规器,该正常化程序是体重尺度转移不变的,因此有效地限制了神经网络的内在规范。派生的正常化程序是网络输入梯度的上限,因此最大程度地降低了改进的正规器也使对抗性鲁棒性受益。还考虑了剩余连接,我们表明我们的正规器还形成了这种残留网络的输入梯度的上限。我们证明了我们提出的正规器在各种数据集和神经网络体系结构上的功效,以改善概括和对抗性鲁棒性。
Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. Residual connections are also considered and we show that our regularizer also forms an upper bound to input gradients of such a residual network. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.