基于公平的能源有效的3D路径计划的便携式访问点：一种深入的增强学习方法

论文标题

基于公平的能源有效的3D路径计划的便携式访问点：一种深入的增强学习方法

Fairness Based Energy-Efficient 3D Path Planning of a Portable Access Point: A Deep Reinforcement Learning Approach

论文作者

Babu, Nithin, Donevski, Igor, Valcarce, Alvaro, Popovski, Petar, Nielsen, Jimmy Jessen, Papadias, Constantinos B.

论文摘要

在这项工作中，我们优化了基于无人机（UAV）的便携式接入点（PAP）的3D轨迹，该轨迹为一组地面节点（GNS）提供无线服务。此外，根据Peukert效应，我们考虑用于无人机电池的实用非线性电池电池。因此，我们以一种新颖的方式提出问题，代表基于公平的能源效率指标的最大化，并被称为公平能源效率（FEE）。该费用指标定义了一个对每用户服务的公平性和PAP的能源效率的重要性。该法式问题采用非凸面问题的形式，并具有不可扣除的约束。为了获得解决方案，我们将问题表示为具有连续状态和动作空间的马尔可夫决策过程（MDP）。考虑到解决方案空间的复杂性，我们使用双胞胎延迟的深层确定性策略梯度（TD3）参与者 - 批判性的深入强化学习（DRL）框架来学习最大化系统费用的政策。我们进行两种类型的RL培训来展示我们方法的有效性：第一种（离线）方法在整个训练阶段保持GN的位置相同；第二种方法将学习的政策概括为GN的任何安排，通过更改GN的位置，每次培训情节后。数值评估表明，忽视Peukert效应高估了PAP的播放时间，可以通过最佳选择PAP的飞行速度来解决。此外，用户公平，能源效率，因此可以通过有效地将PAP移动到GN上方，从而提高系统的费用价值。因此，我们注意到郊区，城市和密集的城市环境的基线情景高达88.31％，272.34％和318.13％。

In this work, we optimize the 3D trajectory of an unmanned aerial vehicle (UAV)-based portable access point (PAP) that provides wireless services to a set of ground nodes (GNs). Moreover, as per the Peukert effect, we consider pragmatic non-linear battery discharge for the battery of the UAV. Thus, we formulate the problem in a novel manner that represents the maximization of a fairness-based energy efficiency metric and is named fair energy efficiency (FEE). The FEE metric defines a system that lays importance on both the per-user service fairness and the energy efficiency of the PAP. The formulated problem takes the form of a non-convex problem with non-tractable constraints. To obtain a solution, we represent the problem as a Markov Decision Process (MDP) with continuous state and action spaces. Considering the complexity of the solution space, we use the twin delayed deep deterministic policy gradient (TD3) actor-critic deep reinforcement learning (DRL) framework to learn a policy that maximizes the FEE of the system. We perform two types of RL training to exhibit the effectiveness of our approach: the first (offline) approach keeps the positions of the GNs the same throughout the training phase; the second approach generalizes the learned policy to any arrangement of GNs by changing the positions of GNs after each training episode. Numerical evaluations show that neglecting the Peukert effect overestimates the air-time of the PAP and can be addressed by optimally selecting the PAP's flying speed. Moreover, the user fairness, energy efficiency, and hence the FEE value of the system can be improved by efficiently moving the PAP above the GNs. As such, we notice massive FEE improvements over baseline scenarios of up to 88.31%, 272.34%, and 318.13% for suburban, urban, and dense urban environments, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题