论文标题
QONNX:代表任意精确的量化神经网络
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
论文作者
论文摘要
我们向开放的神经网络交换(ONNX)中间表示格式提出扩展,以表示任意量化的量化神经网络。我们首先通过利用整数剪辑来引入对现有基于OTNX的量化格式低精度量化的支持,从而产生了两个新的向后兼容的变体:量化剪辑和量化clip-dequantize(qCDQ)格式的量化运算符格式。然后,我们介绍了一种新型的高级ONNX格式,称为量化ONNX(QONNX),该格式介绍了三个新运算符 - 量子,双极和毛刺 - 以表示统一的量化。通过保持QONNX IR高级和灵活性,我们可以针对更广泛的平台。我们还提出了与QONNX合作的实用程序,以及其在FINN和HLS4ML工具链中使用的示例。最后,我们介绍了QONNX模型动物园,以共享低精确的量化神经网络。
We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.