自动VIT-ACC：通过混合量量化的视觉变压器的FPGA感知自动加速框架

论文标题

自动VIT-ACC：通过混合量量化的视觉变压器的FPGA感知自动加速框架

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

论文作者

Li, Zhengang, Sun, Mengshu, Lu, Alec, Ma, Haoyu, Yuan, Geng, Xie, Yanyue, Tang, Hao, Li, Yanyu, Leeser, Miriam, Wang, Zhangyang, Lin, Xue, Fang, Zhenman

论文摘要

视觉变压器（VIT）正在出现，在计算机视觉任务中的准确性显着提高。但是，它们复杂的架构和巨大的计算/存储需求对新硬件加速器设计方法施加了紧迫的需求。这项工作提出了基于拟议的混合式量化量化的FPGA感知自动VIT加速框架。据我们所知，这是第一个基于FPGA的VIT加速框架探索模型量化。与最先进的VIT量化工作（仅无硬件加速的算法方法）相比，我们的量化在相同的位宽度下可实现0.47％至1.36％的TOP-1精度。与32位浮点基线FPGA加速器相比，我们的加速器的帧速率提高了5.6倍（即56.8 fps vs. 10.0 fps），而Imagenet数据集的精度下降了0.71％，用于DeitBase。

Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.47% to 1.36% higher Top-1 accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6x improvement on the frame rate (i.e., 56.8 FPS vs. 10.0 FPS) with 0.71% accuracy drop on ImageNet dataset for DeiT-base.

下载PDF全文

下载文献需遵守相关版权规定

论文标题