论文标题

美女:超越PETAOP/S/W三元DNN推理加速度超过二元能源效率

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

论文作者

Scherer, Moritz, Rutishauser, Georg, Cavigelli, Lukas, Benini, Luca

论文摘要

我们为三元神经网络提供3.1 POP/S/W完全数字硬件加速器。 Cutie是完全展开的三元推理引擎,重点是最大程度地减少非计算能量和开关活动,以便将用于存储(本地或全球)中间结果的动态功率最小化。 This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the切换活动。与最先进的加速器相比,美女的精度更高或相等,同时将整个核心推理能源成本降低了4.8 x-21x。

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源