论文标题
星际轮艇的混合加固学习:基于相移模型的耦合波束形式
Hybrid Reinforcement Learning for STAR-RISs: A Coupled Phase-Shift Model Based Beamformer
论文作者
论文摘要
研究了一种同时传输和反映可重构的智能表面(星形)辅助多用户下行链路多输入单输出(MISO)通信系统。与假设具有独立的传输和反射相移控制的现有理想的星-RIS模型相反,考虑了实用的耦合相移模型。然后,制定了一个关节主动和被动的优化问题,以最大程度地降低长期传输功耗,但要受耦合的相移约束和最小数据速率约束。尽管相移模型的耦合性质,但通过调用混合连续和离散的相移控制策略来解决该法式问题。受到这一观察的启发,一对混合增强算法(RL)算法,即基于混合的深层确定性策略梯度(Hybrid DDPG)算法和基于DDPG&DEEP-DQN联合DDPG&DEED-DQN联合算法的算法。混合DDPG算法通过依靠混合动作映射来控制相关的高维连续和离散作用。相比之下,联合DDPG-DQN算法构建了两个马尔可夫决策过程(MDP),依靠内部和外部环境,从而使两种代理合并以完成关节混合控制。模拟结果表明,就其能源消耗而言,星际 - 里斯与其他常规RIS相比具有优越性。此外,所提出的算法的表现都优于基线DDPG算法,而联合DDPG-DQN算法则取得了出色的性能,尽管计算复杂性的增长。
A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted multi-user downlink multiple-input single-output (MISO) communication system is investigated. In contrast to the existing ideal STAR-RIS model assuming an independent transmission and reflection phase-shift control, a practical coupled phase-shift model is considered. Then, a joint active and passive beamforming optimization problem is formulated for minimizing the long-term transmission power consumption, subject to the coupled phase-shift constraint and the minimum data rate constraint. Despite the coupled nature of the phase-shift model, the formulated problem is solved by invoking a hybrid continuous and discrete phase-shift control policy. Inspired by this observation, a pair of hybrid reinforcement learning (RL) algorithms, namely the hybrid deep deterministic policy gradient (hybrid DDPG) algorithm and the joint DDPG & deep-Q network (DDPG-DQN) based algorithm are proposed. The hybrid DDPG algorithm controls the associated high-dimensional continuous and discrete actions by relying on the hybrid action mapping. By contrast, the joint DDPG-DQN algorithm constructs two Markov decision processes (MDPs) relying on an inner and an outer environment, thereby amalgamating the two agents to accomplish a joint hybrid control. Simulation results demonstrate that the STAR-RIS has superiority over other conventional RISs in terms of its energy consumption. Furthermore, both the proposed algorithms outperform the baseline DDPG algorithm, and the joint DDPG-DQN algorithm achieves a superior performance, albeit at an increased computational complexity.