使用MLIR编译ONNX神经网络模型

论文标题

使用MLIR编译ONNX神经网络模型

Compiling ONNX Neural Network Models Using MLIR

论文作者

Jin, Tian, Bercea, Gheorghe-Teodor, Le, Tung D., Chen, Tong, Su, Gong, Imai, Haruki, Negishi, Yasushi, Leu, Anh, O'Brien, Kevin, Kawachiya, Kiyokuni, Eichenberger, Alexandre E.

论文摘要

深度神经网络模型越来越流行，并已用于各种任务，例如计算机视觉，语音识别和自然语言处理。机器学习模型通常在资源丰富的环境中进行培训，然后在不同的环境中部署，例如高可用性机器或边缘设备。为了协助模型的可移植性，开源社区提出了开放的神经网络交换（ONNX）标准。在本文中，我们在我们的ONNX-MLIR编译器上介绍了一份高级初步报告，该报告生成了代码，以推断ONNX格式中描述的深神经网络模型。 ONX-MLIR是一种使用多级中间表示（MLIR）基础架构实施的开源编译器，该基础架构最近集成在LLVM项目中。 ONNX-MLIR依赖于MLIR的方言概念来实施其功能。我们在这里提出了两个新方言：（1）编码ONNX标准语义的ONNX特定方言，以及（2）基于循环的方言，可为所有ONNX方言操作提供一个共同的较低点。每个中间表示分别促进了其自身的图形和基于循环的优化的特征集。我们通过遵循建议的表示形式来遵循多个模型来说明我们的方法，并包括一些早期的优化工作和绩效结果。

Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source community has proposed the Open Neural Network Exchange (ONNX) standard. In this paper, we present a high-level, preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models described in the ONNX format. Onnx-mlir is an open-source compiler implemented using the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project. Onnx-mlir relies on the MLIR concept of dialects to implement its functionality. We propose here two new dialects: (1) an ONNX specific dialect that encodes the ONNX standard semantics, and (2) a loop-based dialect to provide for a common lowering point for all ONNX dialect operations. Each intermediate representation facilitates its own characteristic set of graph-level and loop-based optimizations respectively. We illustrate our approach by following several models through the proposed representations and we include some early optimization work and performance results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题