论文标题
TFAPPROX:迈向GPU上DNN近似硬件加速器的快速仿真
TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU
论文作者
论文摘要
深度神经网络(DNN)硬件加速器(DNN)的能源效率可以通过引入近似算术电路来提高。为了量化通过使用这些电路并避免昂贵的硬件原型引入的错误,DNN加速器的软件模拟器通常在CPU或GPU上执行。但是,该仿真通常比在CPU或GPU上运行的软件DNN实现速度慢两个或三个数量级,并且具有标准浮点算术指令和常见的DNN库。原因是没有对常见CPU和GPU的近似算术操作的硬件支持,并且必须付清这些操作。为了解决此问题,我们提出了一种有效的仿真方法,用于在给定DNN加速器中使用的近似电路,该方法是在GPU上模拟的。所有相关的近似电路均以查找表实现,并通过CUDA能够GPU的纹理存储机制访问。我们利用了一个事实,即纹理内存是针对不规则读取的访问而优化的,并且在某些GPU架构中甚至可以作为专用的缓存实现。这种技术使我们能够相对于复杂DNN(例如RESNET)优化的CPU版本,将模拟DNN加速器的推理时间约为200倍。所提出的方法扩展了TensorFlow库,可在https://github.com/ehw-fit/tf-approximate上在线获得。
Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate.