美女：超越PETAOP/S/W三元DNN推理加速度超过二元能源效率

论文标题

美女：超越PETAOP/S/W三元DNN推理加速度超过二元能源效率

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

论文作者

Scherer, Moritz, Rutishauser, Georg, Cavigelli, Lukas, Benini, Luca

论文摘要

我们为三元神经网络提供3.1 POP/S/W完全数字硬件加速器。 Cutie是完全展开的三元推理引擎，重点是最大程度地减少非计算能量和开关活动，以便将用于存储（本地或全球）中间结果的动态功率最小化。 This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the切换活动。与最先进的加速器相比，美女的精度更高或相等，同时将整个核心推理能源成本降低了4.8 x-21x。

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.

下载PDF全文

下载文献需遵守相关版权规定

论文标题