论文标题
用HLS4ML压缩FPGA上的深层神经网络至二元和三元精度
Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
论文作者
论文摘要
我们介绍了HLS4ML库中二进制和三元神经网络的实现,旨在自动使用FPGA固件将深层神经网络模型转换为数字电路。从以浮点精度训练的基准模型开始,我们通过将网络参数的数值精度降低到二进制或三元元来研究不同的策略,以减少网络的资源消耗。我们讨论模型准确性和资源消耗之间的权衡。此外,我们通过在网络组件的选定子集上保留完整的精度来展示如何在延迟和准确性之间取得平衡。例如,我们考虑了两个多类分类任务:使用MNIST数据集的手写数字识别和JET识别,并在CERN大型强生对撞机上使用模拟的Proton-Proton碰撞。二进制和三元实现的性能与更高的精度实现相似,同时使用较少的FPGA资源。
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.