Edgedrnn：边缘推理的复发神经网络加速器

论文标题

Edgedrnn：边缘推理的复发神经网络加速器

EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

论文作者

Gao, Chang, Rios-Navarro, Antonio, Chen, Xi, Liu, Shih-Chii, Delbruck, Tobi

论文摘要

低延迟，低功率便携式复发性神经网络（RNN）加速器为实时应用（例如IoT，机器人技术和人机相互作用）提供了强大的推理功能。我们提出了一个轻巧的封闭式复发单元（GRU）的RNN加速器，称为EdgedRNN，该加速器针对低延迟边缘RNN的优化，其批次大小为1。EdgedRNN采用尖峰神经网络启发Delta网络启发的Delta Network算法，以利用RNNS中的暂时性Sparsity。权重存储在廉价的DRAM中，使Edgedrnn能够以最便宜的FPGA计算大型多层RNN。稀疏更新将DRAM的重量存储器访问降低了10倍，而Delta可以动态地改变延迟和准确性之间的权衡。 Edgedrnn在约0.5ms中更新了500万个参数2层GRU-RNN。它达到了与92W NVIDIA 1080 GPU相当的延迟。它的表现优于NVIDIA JETSON NANO，JETSON TX2和INTEL NEARARARE COMPUTE Stick 2的潜伏期2倍。对于批量的1个，Edgedrnn的平均有效吞吐量为20.2gop/s，壁塞电源效率比商业边缘AI平台高4倍。

Low-latency, low-power portable recurrent neural network (RNN) accelerators offer powerful inference capabilities for real-time applications such as IoT, robotics, and human-machine interaction. We propose a lightweight Gated Recurrent Unit (GRU)-based RNN accelerator called EdgeDRNN that is optimized for low-latency edge RNN inference with batch size of 1. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. Weights are stored in inexpensive DRAM which enables EdgeDRNN to compute large multi-layer RNNs on the most inexpensive FPGA. The sparse updates reduce DRAM weight memory access by a factor of up to 10x and the delta can be varied dynamically to trade-off between latency and accuracy. EdgeDRNN updates a 5 million parameter 2-layer GRU-RNN in about 0.5ms. It achieves latency comparable with a 92W Nvidia 1080 GPU. It outperforms NVIDIA Jetson Nano, Jetson TX2 and Intel Neural Compute Stick 2 in latency by 5X. For a batch size of 1, EdgeDRNN achieves a mean effective throughput of 20.2GOp/s and a wall plug power efficiency that is over 4X higher than the commercial edge AI platforms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题