在干涉渠道中，褪色的体验信任区域区域策略优化针对无模型的电力分配

论文标题

在干涉渠道中，褪色的体验信任区域区域策略优化针对无模型的电力分配

Faded-Experience Trust Region Policy Optimization for Model-Free Power Allocation in Interference Channel

论文作者

Khoshkholgh, Mohammad G., Yanikomeroglu, Halim

论文摘要

政策梯度加强学习技术使代理商能够通过与环境的互动直接学习最佳行动政策。然而，尽管具有优势，但有时会遭受缓慢的收敛速度。受到人类决策方法的启发，我们通过增强代理人的记忆和使用最近学习的政策来提高其收敛速度。我们将我们的方法应用于主要用于运动任务的信任区域策略优化（TRPO），并提出了褪色的体验（FE）TRPO。为了证实其有效性，我们采用它以在只有可用的设备的嘈杂位置信息时在干扰渠道中学习连续的电源控制。结果表明，使用FE-TRPO，与TRPO相比，学习速度几乎可以翻一番。重要的是，我们的方法既不增加学习复杂性，也不会造成绩效丧失。

Policy gradient reinforcement learning techniques enable an agent to directly learn an optimal action policy through the interactions with the environment. Nevertheless, despite its advantages, it sometimes suffers from slow convergence speed. Inspired by human decision making approach, we work toward enhancing its convergence speed by augmenting the agent to memorize and use the recently learned policies. We apply our method to the trust-region policy optimization (TRPO), primarily developed for locomotion tasks, and propose faded-experience (FE) TRPO. To substantiate its effectiveness, we adopt it to learn continuous power control in an interference channel when only noisy location information of devices is available. Results indicate that with FE-TRPO it is possible to almost double the learning speed compared to TRPO. Importantly, our method neither increases the learning complexity nor imposes performance loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题