论文标题
无线无人机网络中轨迹设计的元强化学习
Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks
论文作者
论文摘要
在本文中,研究了在动态网络环境中运行的能量约束无人机的最佳轨迹的设计。在考虑的模型中,派遣了一个无人机基站(DBS),以向地面用户提供上行链路连接,他们的需求是动态且无法预测的。在这种情况下,必须对DBS的轨迹进行自适应调整以满足动态用户访问请求。为此,提出了一种元学习算法,以通过调整加固学习(RL)解决方案来遇到新颖环境时,以适应DBS的轨迹。元学习算法提供了一种解决方案,该解决方案可以根据有限的以前经验来快速适应新颖环境中的DB。与基线策略梯度算法相比,在看不见的环境中,元调整的RL显示出具有相当低的计算复杂性的最佳覆盖范围的速度更快。仿真结果表明,与基线策略梯度算法相比,提议的元学习解决方案的收敛速度提高了25%,DBS的通信性能提高了约10%。同时,与基线策略梯度算法相比,DBS服务的可能性超过50%的用户请求的可能性增加了约27%。
In this paper, the design of an optimal trajectory for an energy-constrained drone operating in dynamic network environments is studied. In the considered model, a drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable. In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests. To this end, a meta-learning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments, by tuning a reinforcement learning (RL) solution. The meta-learning algorithm provides a solution that adapts the DBS in novel environments quickly based on limited former experiences. The meta-tuned RL is shown to yield a faster convergence to the optimal coverage in unseen environments with a considerably low computation complexity, compared to the baseline policy gradient algorithm. Simulation results show that, the proposed meta-learning solution yields a 25% improvement in the convergence speed, and about 10% improvement in the DBS' communication performance, compared to a baseline policy gradient algorithm. Meanwhile, the probability that the DBS serves over 50% of user requests increases about 27%, compared to the baseline policy gradient algorithm.