Ecoflow：低功耗神经网络加速器的有效卷积数据流

论文标题

Ecoflow：低功耗神经网络加速器的有效卷积数据流

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

论文作者

Orosa, Lois, Koppula, Skanda, Umuroglu, Yaman, Kanellopoulos, Konstantinos, Gomez-Luna, Juan, Blott, Michaela, Vissers, Kees, Mutlu, Onur

论文摘要

扩张和转移的卷积被广泛用于现代卷积神经网络（CNN）。这些内核在CNN训练和推断应用（例如图像分割和高分辨率图像生成）中广泛使用。尽管这些内核越来越受欢迎，但由于它们的高内存强度，Exascale Compute需求和大量的能耗，因此它们强调当前的计算系统。我们发现，基于空间体系结构的普遍使用的低功率CNN推理加速器并未针对这两个卷积内核进行优化。映射到基本的空间结构时，扩张和转移的卷积会引入显着的零填充物，从而显着降低了性能和能源效率。解决此问题的现有方法需要对原本来计算直接卷积的简单，高效且装备良好的体系结构进行重大的设计更改。为了应对这一挑战，我们提出了Ecoflow，这是一组新的数据流和映射算法，以进行扩张和转移的卷积。这些算法是为了在现有的低成本，小规模的空间体系结构上有效执行的量身定制的，需要对现有加速器的网络芯片进行最小的更改。 Ecoflow通过仔细的数据流编排和针对空间体系结构量身定制的数据映射消除了零填充。 Ecoflow实现了对CNN推断的构造方面的柔性和高性能转置和扩张的卷积。我们评估了Ecoflow对CNN培训工作负载和生成对抗网络（GAN）培训工作量的效率。在我们新的周期精确模拟器中的实验表明，Ecoflow 1）与最新的CNN推理加速器相比，端到端CNN训练时间在7-85％和2）在29-42％之间的端到端GAN训练性能在29-42％之间。

Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption. We find that commonly-used low-power CNN inference accelerators based on spatial architectures are not optimized for both of these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to the network-on-chip of existing accelerators. EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. EcoFlow enables flexible and high-performance transpose and dilated convolutions on architectures that are otherwise optimized for CNN inference. We evaluate the efficiency of EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) training workloads. Experiments in our new cycle-accurate simulator show that EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN inference accelerators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题