论文标题
非线性在激活功能中的影响对深度学习模型的性能
Effects of the Nonlinearity in Activation Functions on the Performance of Deep Learning Models
论文作者
论文摘要
深度学习模型中使用的激活功能的非线性对于预测模型的成功至关重要。有几种常用的简单非线性函数,包括整流的线性单元(relu)和泄漏 - RELU(L-RELU)。实际上,这些功能显着提高了模型的准确性。但是,就某些模型比其他模型表现更好的原因而言,对这些非线性激活功能的功能的洞察力有限。在这里,我们在使用Relu或L-Relu作为激活功能中的不同模型体系结构和数据域中的激活功能时研究模型性能。有趣的是,我们发现当模型中的可训练参数的数量相对较小时,L-Relu的应用大多是有效的。此外,我们发现,图像分类模型似乎在完全连接的层中与L-Relu表现良好,尤其是当使用预训练的模型(例如VGG-16)进行传输学习时。
The nonlinearity of activation functions used in deep learning models are crucial for the success of predictive models. There are several commonly used simple nonlinear functions, including Rectified Linear Unit (ReLU) and Leaky-ReLU (L-ReLU). In practice, these functions remarkably enhance the model accuracy. However, there is limited insight into the functionality of these nonlinear activation functions in terms of why certain models perform better than others. Here, we investigate the model performance when using ReLU or L-ReLU as activation functions in different model architectures and data domains. Interestingly, we found that the application of L-ReLU is mostly effective when the number of trainable parameters in a model is relatively small. Furthermore, we found that the image classification models seem to perform well with L-ReLU in fully connected layers, especially when pre-trained models such as the VGG-16 are used for the transfer learning.