论文标题
FSCNN:快速稀疏的卷积神经网络推理系统
FSCNN: A Fast Sparse Convolution Neural Network Inference System
论文作者
论文摘要
卷积神经网络(CNN)取得了显着的成功,但通常伴随高计算成本和许多冗余重量参数。为了减少拖鞋,结构修剪是一种流行的方法,可以通过引入粗粒稀疏度去除整个隐藏结构。同时,相反,丰富的修剪作品利用细粒度的稀疏性(稀疏性随机分布),而他们的稀疏模型缺乏特殊设计的计算库来实现潜在的加速。在这份技术报告中,我们研究并提出了一个有效的卷积神经网络推理系统,以利用压缩CNN的细粒度稀疏性来加速其前进。我们开发的FSCNN是根据一组专业设计的稀疏数据结构,操作员和相关算法建立的。在实验上,我们验证了FSCNN在流行的CNN体系结构(例如VGG16)上胜过标准的深度学习库Pytorch(如果足够高的稀疏性展示)。但是,由于稀疏操作员的连续性问题,FSCNN通常与高度优化的密集运算符相当。因此,粗粒(结构化)稀疏性是我们对通用模型压缩的建议。
Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters. To reduce the FLOPs, structure pruning is a popular approach to remove the entire hidden structures via introducing coarse-grained sparsity. Meanwhile, plentiful pruning works leverage fine-grained sparsity instead (sparsity are randomly distributed), whereas their sparse models lack special designed computing library for potential speedup. In this technical report, we study and present an efficient convolution neural network inference system to accelerate its forward pass by utilizing the fine-grained sparsity of compressed CNNs. Our developed FSCNN is established based on a set of specialized designed sparse data structures, operators and associated algorithms. Experimentally, we validate that FSCNN outperforms standard deep learning library PyTorch on popular CNN architectures such as VGG16 if sufficiently high sparsity exhibits. However, due to the contiguity issue of sparse operators, FSCNN is typically not comparable with highly optimized dense operator. Therefore, coarse-grained (structured) sparsity is our recommendation for generic model compression.