论文标题
测量具有曲线激活功能的神经网络的模型复杂性
Measuring Model Complexity of Neural Networks with Curve Activation Functions
论文作者
论文摘要
衡量模型深神经网络的复杂性是至关重要的。有关模型复杂性的现有文献主要集中于具有分段线性激活功能的神经网络。具有通用曲线激活功能的神经网络的模型复杂性仍然是一个开放的问题。为了应对挑战,在本文中,我们首先提出了线性近似神经网络(简称LANN),这是一个分段线性框架,用于近似具有曲线激活函数的给定深层模型。 Lann构建了每个神经元激活函数的单个分段线性近似,并最小化线性区域的数量以满足所需的近似程度。然后,我们分析了LANN形成的线性区域数量的上限,并根据上限得出复杂度度量。为了检查复杂度度量的有用性,我们通过实验探索神经网络的训练过程并检测过度拟合。我们的结果表明,过度拟合的发生与训练过程中模型复杂性的增加呈正相关。我们发现$ l^1 $和$ l^2 $正规化抑制了模型复杂性的增加。最后,我们提出了两种方法,以防止直接限制模型复杂性过度适应,即神经元修剪和定制$ l^1 $正则化。
It is fundamental to measure model complexity of deep neural networks. The existing literature on model complexity mainly focuses on neural networks with piecewise linear activation functions. Model complexity of neural networks with general curve activation functions remains an open problem. To tackle the challenge, in this paper, we first propose the linear approximation neural network (LANN for short), a piecewise linear framework to approximate a given deep model with curve activation function. LANN constructs individual piecewise linear approximation for the activation function of each neuron, and minimizes the number of linear regions to satisfy a required approximation degree. Then, we analyze the upper bound of the number of linear regions formed by LANNs, and derive the complexity measure based on the upper bound. To examine the usefulness of the complexity measure, we experimentally explore the training process of neural networks and detect overfitting. Our results demonstrate that the occurrence of overfitting is positively correlated with the increase of model complexity during training. We find that the $L^1$ and $L^2$ regularizations suppress the increase of model complexity. Finally, we propose two approaches to prevent overfitting by directly constraining model complexity, namely neuron pruning and customized $L^1$ regularization.