论文标题

视觉变压器的行式加速器

Row-wise Accelerator for Vision Transformer

论文作者

Wang, Hong-Yi, Chang, Tian-Sheuan

论文摘要

随着自然语言处理的成功,视力应用的变压器由于表现出色而引起了近年来的重大关注。但是,由于重要的模型架构差异,现有的深度学习硬件加速器无法有效地执行此结构。结果,本文提出了通过行调度计划的视觉变压器的硬件加速器,该加速器将视觉变压器中的主要操作分解为单个点产品,以统一和高效的执行。此外,通过在列中共享权重,我们可以重复使用数据并减少内存的使用情况。使用TSMC 40NM CMOS技术实施仅需要262K门计数和149KB SRAM缓冲区,即可在600MHz时钟频率下进行403.2 GOPS吞吐量。

Following the success of the natural language processing, the transformer for vision applications has attracted significant attention in recent years due to its excellent performance. However, existing deep learning hardware accelerators for vision cannot execute this structure efficiently due to significant model architecture differences. As a result, this paper proposes the hardware accelerator for vision transformers with row-wise scheduling, which decomposes major operations in vision transformers as a single dot product primitive for a unified and efficient execution. Furthermore, by sharing weights in columns, we can reuse the data and reduce the usage of memory. The implementation with TSMC 40nm CMOS technology only requires 262K gate count and 149KB SRAM buffer for 403.2 GOPS throughput at 600MHz clock frequency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源