优化基于FPGA的加速器的时间卷积网络推断

论文标题

优化基于FPGA的加速器的时间卷积网络推断

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators

论文作者

Carreras, Marco, Deriu, Gianfranco, Raffo, Luigi, Benini, Luca, Meloni, Paolo

论文摘要

卷积神经网络广泛用于广泛的应用程序，通常包括计算机视觉任务，例如图像和视频分类，识别和细分。最近的研究结果表明，可以在时间序列和序列分类和分割以及涉及序列建模的任务中有效地使用涉及单维汇合和扩张的多层网络（深）网络。这些结构通常称为时间卷积网络（TCN），已经证明在准确性和训练时间方面始终超过复发性神经网络[1]。尽管基于FPGA的经典CNN的推理加速器广泛存在，但缺乏对其在TCN模型推理的可用性的定量评估。在本文中，我们提出了这样的评估，考虑到具有支持TCN内核作为参考和一组最新的TCN作为基准的CNN加速器。实验结果表明，在TCN执行过程中，操作强度对于整体性能至关重要。我们提出了基于批处理处理的卷积调度，该调度可以提高效率高达理论峰值性能的96％。总体而言，在Ultrascale+ Zu3EG上，我们最多可以实现33,9 GOPS/s/w的最高111,8 GOPS/s（相对于纯软件实施，高达10倍加速和3倍的功率效率提高）。

Convolutional Neural Networks are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition, and segmentation. Recent research results demonstrate that multilayer(deep) networks involving mono-dimensional convolutions and dilation can be effectively used in time series and sequences classification and segmentation, as well as in tasks involving sequence modelling. These structures, commonly referred to as Temporal Convolutional Networks (TCNs), have been demonstrated to consistently outperform Recurrent Neural Networks in terms of accuracy and training time [1]. While FPGA-based inference accelerators for classic CNNs are widespread, literature is lacking in a quantitative evaluation of their usability on inference for TCN models. In this paper we present such an evaluation, considering a CNN accelerator with specific features supporting TCN kernels as a reference and a set of state-of-the-art TCNs as a benchmark. Experimental results show that, during TCN execution, operational intensity can be critical for the overall performance. We propose a convolution scheduling based on batch processing that can boost efficiency up to 96% of theoretical peak performance. Overall we can achieve up to 111,8 GOPS/s and power efficiency of 33,9 GOPS/s/W on an Ultrascale+ ZU3EG (up to 10x speedup and 3x power efficiency improvement with respect to pure software implementation).

下载PDF全文

下载文献需遵守相关版权规定

论文标题