使用深入的增强学习在无单元网络中分布的上行链路上线形成

论文标题

使用深入的增强学习在无单元网络中分布的上行链路上线形成

Distributed Uplink Beamforming in Cell-Free Networks Using Deep Reinforcement Learning

论文作者

Fredj, Firas, Al-Eryani, Yasser, Maghsudi, Setareh, Akrout, Mohamed, Hossain, Ekram

论文摘要

新的无线技术的出现以及大规模连通性的要求导致了几个技术问题，例如过度干扰，信号处理的高计算需求以及冗长的处理延迟。在这项工作中，我们为无上行链路的网络提出了几种波束形成技术，该网络具有集中式，半分布和完全分布的处理，所有这些都基于深度加固学习（DRL）。首先，我们提出了一种完全集中的波束形成方法，该方法使用具有连续空间的深层确定性策略梯度算法（DDPG）。然后，我们通过在接入点（AP）启用分布式体验来增强此方法。实际上，我们开发了一个使用分布式分布确定性策略梯度算法（D4PG）的横向成型方案，其AP代表分布式药物。最后，为了降低计算复杂性，我们提出了一个完全分布的波束成形方案，将光束成型的计算分配给AP之间。结果表明，具有分布式体验的D4PG方案无关网络规模而实现最佳性能。此外，所提出的分布式波束形成技术的性能优于DDPG算法，仅用于小型网络的集中学习。随着AP和/或用户的数量增加，DDPG模型的性能优势变得更加明显。此外，在操作阶段，所有DRL模型的处理时间都比常规梯度下降（GD）溶液明显短。

The emergence of new wireless technologies together with the requirement of massive connectivity results in several technical issues such as excessive interference, high computational demand for signal processing, and lengthy processing delays. In this work, we propose several beamforming techniques for an uplink cell-free network with centralized, semi-distributed, and fully distributed processing, all based on deep reinforcement learning (DRL). First, we propose a fully centralized beamforming method that uses the deep deterministic policy gradient algorithm (DDPG) with continuous space. We then enhance this method by enabling distributed experience at access points (AP). Indeed, we develop a beamforming scheme that uses the distributed distributional deterministic policy gradients algorithm (D4PG) with the APs representing the distributed agents. Finally, to decrease the computational complexity, we propose a fully distributed beamforming scheme that divides the beamforming computations among APs. The results show that the D4PG scheme with distributed experience achieves the best performance irrespective of the network size. Furthermore, the proposed distributed beamforming technique performs better than the DDPG algorithm with centralized learning only for small-scale networks. The performance superiority of the DDPG model becomes more evident as the number of APs and/or users increases. Moreover, during the operation stage, all DRL models demonstrate a significantly shorter processing time than that of the conventional gradient descent (GD) solution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题