多用户与批处理处理能力的边缘服务器共同推广

论文标题

多用户与批处理处理能力的边缘服务器共同推广

Multi-user Co-inference with Batch Processing Capable Edge Server

论文作者

Shi, Wenqi, Zhou, Sheng, Niu, Zhisheng, Jiang, Miao, Geng, Lu

论文摘要

图形处理单元（GPU）可以通过批处理处理来改善深层神经网络推理吞吐量，在该处理中，该处理是同时处理多个任务的。我们专注于带有GPU的Edge-Load推理任务的能源约束的移动设备将推理任务卸载。推理任务被分配为子任务，以提供更精细的卸载和调度粒度，并研究了在推断潜伏期约束下的用户能源消耗最小化问题。为了处理并发批处理处理引入的耦合卸载和调度，我们首先考虑一个离线问题，具有恒定的边缘推理潜伏期和相同的延迟约束。事实证明，优化每个用户的卸载策略并在一个批处理中汇总所有相同的子任务是最佳的，因此启发了独立的分区和相同的子任务汇总（IP-SSA）算法。此外，在延迟约束不同时，提出了最佳分组（OG）算法以最佳分组任务。最后，当无法准确预测未来的任务到达时，培训了深层确定性的策略梯度（DDPG）代理可以致电OG。实验表明，在离线设置中，IP-SSA最多将用户能源消耗降低到94.9％，而DDPG-OG在在线设置中的表现均优于DDPG-IP-SSA高达8.92 \％。

Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload inference tasks to an edge server with GPU. The inference task is partitioned into sub-tasks for a finer granularity of offloading and scheduling, and the user energy consumption minimization problem under inference latency constraints is investigated. To deal with the coupled offloading and scheduling introduced by concurrent batch processing, we first consider an offline problem with a constant edge inference latency and the same latency constraint. It is proven that optimizing the offloading policy of each user independently and aggregating all the same sub-tasks in one batch is optimal, and thus the independent partitioning and same sub-task aggregating (IP-SSA) algorithm is inspired. Further, the optimal grouping (OG) algorithm is proposed to optimally group tasks when the latency constraints are different. Finally, when future task arrivals cannot be precisely predicted, a deep deterministic policy gradient (DDPG) agent is trained to call OG. Experiments show that IP-SSA reduces up to 94.9\% user energy consumption in the offline setting, while DDPG-OG outperforms DDPG-IP-SSA by up to 8.92\% in the online setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题