AutoDice：边缘的全自动分布式CNN推断

论文标题

AutoDice：边缘的全自动分布式CNN推断

AutoDiCE: Fully Automated Distributed CNN Inference at the Edge

论文作者

Guo, Xiaotian, Pimentel, Andy D., Stefanov, Todor

论文摘要

基于卷积神经网络（CNN）的深度学习方法被广泛使用，并且在广泛的应用领域（包括图像分类和语音识别）中非常成功。为了执行经过训练的CNN，即模型推断，如今，我们见证了从云到边缘的转变。不幸的是，在边缘设备上部署和推断大型，计算和内存密集型CNN是具有挑战性的，因为这些设备通常具有有限的功率预算和计算/内存资源。解决此挑战的一种方法是利用多个边缘设备上的所有可用资源来部署和执行大型CNN，通过正确划分CNN并在单独的边缘设备上运行每个CNN分区。 Although such distribution, deployment, and execution of large CNNs on multiple edge devices is a desirable and beneficial approach, there currently does not exist a design and programming framework that takes a trained CNN model, together with a CNN partitioning specification, and fully automates the CNN model splitting and deployment on multiple edge devices to facilitate distributed CNN inference at the Edge.因此，在本文中，我们提出了一个称为Autodice的新型框架，用于将CNN模型自动分解为一组子模型和自动代码生成，以在多个（可能是异质的边缘设备）上进行分布和协作执行这些子模型，同时支持边缘设备内部和边缘设备中的并行设备的剥削。我们的实验结果表明，AutoDice可以通过每台边缘设备减少能源消耗和内存使用，并同时提高整体系统吞吐量，并提供分布式的CNN推断。

Deep Learning approaches based on Convolutional Neural Networks (CNNs) are extensively utilized and very successful in a wide range of application areas, including image classification and speech recognition. For the execution of trained CNNs, i.e. model inference, we nowadays witness a shift from the Cloud to the Edge. Unfortunately, deploying and inferring large, compute and memory intensive CNNs on edge devices is challenging because these devices typically have limited power budgets and compute/memory resources. One approach to address this challenge is to leverage all available resources across multiple edge devices to deploy and execute a large CNN by properly partitioning the CNN and running each CNN partition on a separate edge device. Although such distribution, deployment, and execution of large CNNs on multiple edge devices is a desirable and beneficial approach, there currently does not exist a design and programming framework that takes a trained CNN model, together with a CNN partitioning specification, and fully automates the CNN model splitting and deployment on multiple edge devices to facilitate distributed CNN inference at the Edge. Therefore, in this paper, we propose a novel framework, called AutoDiCE, for automated splitting of a CNN model into a set of sub-models and automated code generation for distributed and collaborative execution of these sub-models on multiple, possibly heterogeneous, edge devices, while supporting the exploitation of parallelism among and within the edge devices. Our experimental results show that AutoDiCE can deliver distributed CNN inference with reduced energy consumption and memory usage per edge device, and improved overall system throughput at the same time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题