移动边缘计算中的一般任务图的卸载和资源分配：一种深厚的加固学习方法

论文标题

移动边缘计算中的一般任务图的卸载和资源分配：一种深厚的加固学习方法

Offloading and Resource Allocation with General Task Graph in Mobile Edge Computing: A Deep Reinforcement Learning Approach

论文作者

Yan, Jia, Bi, Suzhi, Zhang, Ying-Jun Angela

论文摘要

在本文中，我们考虑了一个移动边缘计算系统，其中访问点有助于移动设备（MD）执行一个由一般任务调用图的多个任务组成的应用程序。目的是共同确定每个任务的卸载决策，以及在随着时光的无线褪色渠道和随机边缘计算能力下的资源分配，以便将MD的能源时间成本（等）最小化。由于组合卸载决策以及一般依赖性模型下的任务执行之间的强大耦合，解决问题尤其困难。常规的数值优化方法效率低下，无法解决此类问题，尤其是在问题大小很大的情况下。为了解决这个问题，我们提出了基于参与者批评结构的深入加强学习（DRL）框架。特别是，Actor网络利用DNN来学习从输入状态到每个任务的二进制卸载决策的最佳映射。同时，通过分析最佳解决方案的结构，我们为评论家网络提供了一种低复杂性算法，以快速评估Actor网络输出卸载决策的ETC性能。借助低复杂的评论家网络，我们可以快速选择最佳的卸载操作，然后将州行动对存储在体验重播内存中，作为训练数据集，以不断改善动作生成DNN。为了进一步降低复杂性，我们表明最佳的卸载决策表现出一个单一攀登的结构，可以将其用于大大降低行动搜索空间的生成。数值结果表明，对于各种类型的任务图，所提出的算法可达到最佳性能的$ 99.1 \％$ $，同时与现有优化方法相比大大降低了计算复杂性。

In this paper, we consider a mobile-edge computing system, where an access point assists a mobile device (MD) to execute an application consisting of multiple tasks following a general task call graph. The objective is to jointly determine the offloading decision of each task and the resource allocation under time-varying wireless fading channels and stochastic edge computing capability, so that the energy-time cost (ETC) of the MD is minimized. Solving the problem is particularly hard due to the combinatorial offloading decisions and the strong coupling among task executions under the general dependency model. Conventional numerical optimization methods are inefficient to solve such a problem, especially when the problem size is large. To address the issue, we propose a deep reinforcement learning (DRL) framework based on the actor-critic learning structure. In particular, the actor network utilizes a DNN to learn the optimal mapping from the input states to the binary offloading decision of each task. Meanwhile, by analyzing the structure of the optimal solution, we derive a low-complexity algorithm for the critic network to quickly evaluate the ETC performance of the offloading decisions output by the actor network. With the low-complexity critic network, we can quickly select the best offloading action and subsequently store the state-action pair in an experience replay memory as the training dataset to continuously improve the action generation DNN. To further reduce the complexity, we show that the optimal offloading decision exhibits an one-climb structure, which can be utilized to significantly reduce the search space of action generation. Numerical results show that for various types of task graphs, the proposed algorithm achieves up to $99.1\%$ of the optimal performance while significantly reducing the computational complexity compared to the existing optimization methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题