无人机轨迹，用户协会和多动能电力的功率控制启用能源收集通信：离线设计和在线增强学习

论文标题

无人机轨迹，用户协会和多动能电力的功率控制启用能源收集通信：离线设计和在线增强学习

UAV Trajectory, User Association and Power Control for Multi-UAV Enabled Energy Harvesting Communications: Offline Design and Online Reinforcement Learning

论文作者

Fu, Chien-Wei, Ku, Meng-Lin, Chen, Yu-Jia, Quek, Tony Q. S.

论文摘要

在本文中，我们考虑了多个太阳能无线节点，这些节点利用收获的太阳能将收集的数据传输到上行链路中的多个无人机（UAV）。在这种情况下，我们共同设计了无人机的飞行轨迹，无人机节点通信关联和上行链路控制，以有效利用收获的能量，并在有限的时间范围内管理共沟通干扰。为了确保无线节点的公平性，设计目标是最大化最差的用户速率。联合设计问题是高度非凸，需要因果（未来）了解瞬时能量状态信息（ESI）和渠道状态信息（CSI），这在现实中很难预测。为了克服这些挑战，我们提出了一种基于凸优化的离线方法，仅利用平均ESI和CSI。该问题通过连续凸近似（SCA）和替代优化的三个凸子问题解决。我们进一步设计了一种在线凸辅助增强学习（CARL）方法，以根据实时环境信息来改善系统性能。与传统的加固学习（RL）方法相比，提出了基于最佳离线无人机轨迹的多动力自动规范飞行走廊的想法，旨在避免使用无人机的不必要的飞行探索，并使我们能够提高学习效率和系统性能。计算机模拟用于验证所提出方法的有效性。提议的CARL方法可为离线和常规RL方法的最差用户率提供25％和12％的提高。

In this paper, we consider multiple solar-powered wireless nodes which utilize the harvested solar energy to transmit collected data to multiple unmanned aerial vehicles (UAVs) in the uplink. In this context, we jointly design UAV flight trajectories, UAV-node communication associations, and uplink power control to effectively utilize the harvested energy and manage co-channel interference within a finite time horizon. To ensure the fairness of wireless nodes, the design goal is to maximize the worst user rate. The joint design problem is highly non-convex and requires causal (future) knowledge of the instantaneous energy state information (ESI) and channel state information (CSI), which are difficult to predict in reality. To overcome these challenges, we propose an offline method based on convex optimization that only utilizes the average ESI and CSI. The problem is solved by three convex subproblems with successive convex approximation (SCA) and alternative optimization. We further design an online convex-assisted reinforcement learning (CARL) method to improve the system performance based on real-time environmental information. An idea of multi-UAV regulated flight corridors, based on the optimal offline UAV trajectories, is proposed to avoid unnecessary flight exploration by UAVs and enables us to improve the learning efficiency and system performance, as compared with the conventional reinforcement learning (RL) method. Computer simulations are used to verify the effectiveness of the proposed methods. The proposed CARL method provides 25% and 12% improvement on the worst user rate over the offline and conventional RL methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题