Quidam：量化感知DNN加速器和模型共探索的框架

论文标题

Quidam：量化感知DNN加速器和模型共探索的框架

QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model Co-Exploration

论文作者

Inci, Ahmet, Virupaksha, Siri Garudanagiri, Jain, Aman, Chin, Ting-Wu, Thallam, Venkata Vivek, Ding, Ruizhou, Marculescu, Diana

论文摘要

随着机器学习和系统社区努力通过自定义的深神经网络（DNN）加速器，多样的精度或量化水平以及模型压缩技术来实现更高的能量效率，因此需要将设计空间探索框架结合到量化空间探索框架中，这些框架将意识到的处理元素纳入加速器设计空间中，同时具有准确和快速的功能和快速的功能，性能，性能，性能和区域模型。在这项工作中，我们提出了Quidam，这是一种高度参数化的量化DNN加速器和模型共探索框架。我们的框架可以促进对DNN加速器设计空间探索的未来研究，以提供各种设计选择，例如位精度，处理元素类型，处理元素的刮擦大小，全局缓冲区大小，总处理元素的数量和DNN配置。我们的结果表明，不同的精确度和处理元素类型会导致每个区域和能量性能方面的显着差异。具体而言，我们的框架标识了各种各样的设计点，在该设计点中，每个面积和能量的性能分别变化超过5倍和35倍。通过提出的框架，我们表明，与最佳基于INT16的实施相比，轻巧的处理元素可在准确性结果上实现，每个区域的性能和能量提高高达5.7倍。最后，由于预先特征的功率，性能和区域模型的效率，Quidam可以将设计勘探过程加快3-4个数量级，因为它消除了每种设计的昂贵合成和表征的需求。

As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QUIDAM, a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5x and 35x, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7x more performance per area and energy improvement when compared to the best INT16 based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by 3-4 orders of magnitude as it removes the need for expensive synthesis and characterization of each design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题