参数有效卷积神经网络的深层共享过滤器基础

论文标题

参数有效卷积神经网络的深层共享过滤器基础

Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

论文作者

Kang, Woochul, Kim, Daeyeon

论文摘要

现代卷积神经网络（CNN）具有相同的相同卷积块，因此已经提出了在这些块上递归参数的递归共享，以减少参数的量。但是，参数的天真共享构成了许多挑战，例如有限的代表力和消失/爆炸梯度的递归共享参数问题。在本文中，我们提出了一种递归的卷积块设计和训练方法，其中递归共享的部分或过滤器是分离和学习的，同时有效地避免了训练过程中消失/爆炸的梯度问题。我们表明，可以通过执行滤波器基础的元素来控制笨拙的消失/爆炸梯度问题，并从经验上证明，所提出的正交性正规化可改善训练过程中梯度的流动。图像分类和对象检测的实验结果表明，与以前的参数共享方法不同，我们的方法不会交易性能来节省参数，并且始终超过了过度参数化的对应网络。这种出色的表现表明，所提出的递归卷积块设计和正交性正则化不仅可以防止性能降解，而且还持续提高表示能力，同时递归共享大量参数。

Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of parameters poses many challenges such as limited representational power and the vanishing/exploding gradients problem of recursively shared parameters. In this paper, we present a recursive convolution block design and training method, in which a recursively shareable part, or a filter basis, is separated and learned while effectively avoiding the vanishing/exploding gradients problem during training. We show that the unwieldy vanishing/exploding gradients problem can be controlled by enforcing the elements of the filter basis orthonormal, and empirically demonstrate that the proposed orthogonality regularization improves the flow of gradients during training. Experimental results on image classification and object detection show that our approach, unlike previous parameter-sharing approaches, does not trade performance to save parameters and consistently outperforms overparameterized counterpart networks. This superior performance demonstrates that the proposed recursive convolution block design and the orthogonality regularization not only prevent performance degradation, but also consistently improve the representation capability while a significant amount of parameters are recursively shared.

下载PDF全文

下载文献需遵守相关版权规定

论文标题