有关移动推理深神经网络潜伏期可变性的注释

论文标题

有关移动推理深神经网络潜伏期可变性的注释

A Note on Latency Variability of Deep Neural Networks for Mobile Inference

论文作者

Yang, Luting, Lu, Bingqian, Ren, Shaolei

论文摘要

在移动设备上运行深层神经网络（DNN）推断，即移动推理已成为一种增长的趋势，从而使推理降低了网络连接的依赖，并在本地保留私人数据。对移动推理优化DNN的先前研究通常集中在平均推理潜伏期的度量上，因此隐含地假设移动推理几乎没有延迟可变性。在本说明中，我们对移动推理的DNN的潜伏期可变性进行了初步测量研究。我们表明，在存在CPU资源争夺的情况下，推断潜伏期的可变性可能会变得非常重要。更有趣的是，与普遍的信念不同，即DNN上DNN的相对性能优越性可以延续到另一种设备和/或其他级别的资源争议中，我们强调，当具有更好的延迟性能的DNN模型比另一个模型更好的是，当资源争议更加严重或在另一个设备上运行时，其他模型就会胜过另一个模型。因此，当优化移动推理的DNN模型时，仅测量平均潜伏期可能不足。取而代之的是，应考虑到各种条件下的延迟可变性，包括但不限于本说明中考虑的不同设备和不同级别的CPU资源争议。

Running deep neural network (DNN) inference on mobile devices, i.e., mobile inference, has become a growing trend, making inference less dependent on network connections and keeping private data locally. The prior studies on optimizing DNNs for mobile inference typically focus on the metric of average inference latency, thus implicitly assuming that mobile inference exhibits little latency variability. In this note, we conduct a preliminary measurement study on the latency variability of DNNs for mobile inference. We show that the inference latency variability can become quite significant in the presence of CPU resource contention. More interestingly, unlike the common belief that the relative performance superiority of DNNs on one device can carry over to another device and/or another level of resource contention, we highlight that a DNN model with a better latency performance than another model can become outperformed by the other model when resource contention be more severe or running on another device. Thus, when optimizing DNN models for mobile inference, only measuring the average latency may not be adequate; instead, latency variability under various conditions should be accounted for, including but not limited to different devices and different levels of CPU resource contention considered in this note.

下载PDF全文

下载文献需遵守相关版权规定

论文标题