论文标题
优化边缘设备上的分组卷积
Optimizing Grouped Convolutions on Edge Devices
论文作者
论文摘要
在受约束的硬件上部署深度神经网络时,可以用分组的卷积替换网络的标准卷积。这样可以节省大量的内存,而准确性损失最小。但是,现代深度学习框架中当前的分组卷积实施远非速度最佳性能。在本文中,我们提出了分组的空间背包卷积(GSPC),这是一种超过现有解决方案的分组卷积的新实施。我们在TVM中实现GSPC,该GSPC在边缘设备上提供最先进的性能。我们分析了一组使用不同类型的分组卷积的网络,并在几个边缘设备上的推理时间方面评估了它们的性能。我们观察到,我们的新实施与组数量相当良好,并在所有设置中提供最佳的推理时间,从而改善了TVM,Pytorch和Tensorflow Lite在平均平均3.4倍,8倍和4倍的现有实现。代码可从https://github.com/geclab/tvm-gspc/获得。
When deploying a deep neural network on constrained hardware, it is possible to replace the network's standard convolutions with grouped convolutions. This allows for substantial memory savings with minimal loss of accuracy. However, current implementations of grouped convolutions in modern deep learning frameworks are far from performing optimally in terms of speed. In this paper we propose Grouped Spatial Pack Convolutions (GSPC), a new implementation of grouped convolutions that outperforms existing solutions. We implement GSPC in TVM, which provides state-of-the-art performance on edge devices. We analyze a set of networks utilizing different types of grouped convolutions and evaluate their performance in terms of inference time on several edge devices. We observe that our new implementation scales well with the number of groups and provides the best inference times in all settings, improving the existing implementations of grouped convolutions in TVM, PyTorch and TensorFlow Lite by 3.4x, 8x and 4x on average respectively. Code is available at https://github.com/gecLAB/tvm-GSPC/