使用图形编译器和容器优化AI培训部署

论文标题

使用图形编译器和容器优化AI培训部署

Optimising AI Training Deployments using Graph Compilers and Containers

论文作者

Mujkanovic, Nina, Sivalingam, Karthee, Lazzaro, Alfio

论文摘要

基于深神网络（DNN）或深度学习（DL）的人工智能（AI）应用程序由于成功解决图像分析和语音识别等问题而变得流行。训练DNN是计算密集的，高性能计算（HPC）一直是AI增长的关键驱动力。虚拟化和容器技术导致云和HPC基础架构的收敛性。这些具有多种硬件的基础架构增加了部署和优化AI培训工作量的复杂性。可以通过针对特定于目标的库，图形编译器来优化HPC或云中的AI培训部署，并改善数据移动或IO。图形编译器旨在通过为目标硬件/后端生成优化的代码来优化DNN图的执行。作为Sodalite（Horizon 2020项目）的一部分，开发了Modak工具，以优化软件定义的基础架构中的应用程序部署。使用来自数据科学家和性能建模的输入，MODAK将最佳应用参数映射到目标基础架构，并构建优化的容器。在本文中，我们介绍了Modak并回顾了AI的集装箱技术和图形编译器。我们说明了使用图形编译器和奇点容器对AI培训部署进行优化。使用MNIST-CNN和RESNET50培训工作负载进行评估表明，自定义的优化容器的表现优于Dockerhub的官方图像。我们还发现，图表的性能取决于目标硬件和神经网络的复杂性。

Artificial Intelligence (AI) applications based on Deep Neural Networks (DNN) or Deep Learning (DL) have become popular due to their success in solving problems likeimage analysis and speech recognition. Training a DNN is computationally intensive and High Performance Computing(HPC) has been a key driver in AI growth. Virtualisation and container technology have led to the convergence of cloud and HPC infrastructure. These infrastructures with diverse hardware increase the complexity of deploying and optimising AI training workloads. AI training deployments in HPC or cloud can be optimised with target-specific libraries, graph compilers, andby improving data movement or IO. Graph compilers aim to optimise the execution of a DNN graph by generating an optimised code for a target hardware/backend. As part of SODALITE (a Horizon 2020 project), MODAK tool is developed to optimise application deployment in software defined infrastructures. Using input from the data scientist and performance modelling, MODAK maps optimal application parameters to a target infrastructure and builds an optimised container. In this paper, we introduce MODAK and review container technologies and graph compilers for AI. We illustrate optimisation of AI training deployments using graph compilers and Singularity containers. Evaluation using MNIST-CNN and ResNet50 training workloads shows that custom built optimised containers outperform the official images from DockerHub. We also found that the performance of graph compilers depends on the target hardware and the complexity of the neural network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题