在极端边缘设备中启用混合精液量化的神经网络

论文标题

在极端边缘设备中启用混合精液量化的神经网络

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

论文作者

Bruschi, Nazareno, Garofalo, Angelo, Conti, Francesco, Tagliavini, Giuseppe, Rossi, Davide

论文摘要

高级微控制器上量化的神经网络（QNN）的部署需要优化的软件来利用现代教学套件体系结构（ISA）的数字信号处理（DSP）扩展。因此，最近的研究提出了针对QNN（从8位到2位）的优化库，例如CMSIS-NN和PULP-NN。这项工作介绍了针对混合精液深神经网络加速的PULP-NN库的扩展，这是一种新兴的范式，能够显着缩小具有可忽略的精度损失的深神经网络的记忆足迹。该库由27个内核组成，每个列表的输入特征图，权重和输出特征图映射的精度（考虑8位，4位和2位）组成，可以在基于RISC-V的处理器上有效推断QNN，以RISC-V-V基于RISC-V的处理器的形式进行RV322IMCXPULPULPLPV2 ISA。提出的解决方案在8核GAP-8纸浆群上进行了基准测试，在8个内核上达到16 MAC/循环的峰值性能，比STM32H7（由ARM Cortex M7处理器提供动力）快21倍至25倍，具有15倍至21倍至21倍的能源效率。

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21x to 25x faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15x to 21x better energy efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题