扩展边缘推理的深度神经网络优化

论文标题

扩展边缘推理的深度神经网络优化

Scaling Up Deep Neural Network Optimization for Edge Inference

论文作者

Lu, Bingqian, Yang, Jianyi, Ren, Shaolei

论文摘要

深度神经网络（DNN）越来越多地部署并与边缘设备（例如手机，无人机，机器人和可穿戴设备）集成在一起。要以令人满意的性能直接在边缘设备（又称边缘推理）上运行DNN推断，请优化DNN设计（例如，网络体系结构和量化策略）至关重要。尽管最先进的DNN设计利用了性能预测值来加快优化过程，但它们是设备特异性的（即，每个目标设备的每个预测变量），因此在存在极度多样化的边缘设备的情况下不能很好地扩展。此外，即使使用性能预测指标，优化器（例如，基于搜索的优化）在优化许多不同设备的DNN时仍然很耗时。在这项工作中，我们提出了两种扩展DNN优化的方法。在第一种方法中，我们重复使用构建代理设备上的性能预测指标，并利用性能单调性来扩展DNN优化，而无需重新构建每个不同设备的性能预测指标。在第二种方法中，我们构建可扩展性能预测指标，可以在给定DNN设备对的情况下估算所得的性能（例如推理精度/潜伏期/能量），并使用基于神经网络的自动化优化器，该优化器将设备功能和优化参数（以输入为输入），然后直接通过延长每个单个设备进行延长的DNN设计。

Deep neural networks (DNNs) have been increasingly deployed on and integrated with edge devices, such as mobile phones, drones, robots and wearables. To run DNN inference directly on edge devices (a.k.a. edge inference) with a satisfactory performance, optimizing the DNN design (e.g., network architecture and quantization policy) is crucial. While state-of-the-art DNN designs have leveraged performance predictors to speed up the optimization process, they are device-specific (i.e., each predictor for only one target device) and hence cannot scale well in the presence of extremely diverse edge devices. Moreover, even with performance predictors, the optimizer (e.g., search-based optimization) can still be time-consuming when optimizing DNNs for many different devices. In this work, we propose two approaches to scaling up DNN optimization. In the first approach, we reuse the performance predictors built on a proxy device, and leverage the performance monotonicity to scale up the DNN optimization without re-building performance predictors for each different device. In the second approach, we build scalable performance predictors that can estimate the resulting performance (e.g., inference accuracy/latency/energy) given a DNN-device pair, and use a neural network-based automated optimizer that takes both device features and optimization parameters as input and then directly outputs the optimal DNN design without going through a lengthy optimization process for each individual device.

下载PDF全文

下载文献需遵守相关版权规定

论文标题