论文标题
多通道无线网络中动态频谱感测和聚集的深度加强学习
Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks
论文作者
论文摘要
在本文中,在包含n个相关通道的无线网络中研究了动态频谱感测和聚集的问题,在其中,这些通道在未知的2-State Markov模型下被占据或空置。在每个时间插槽中,具有某些带宽要求的单个认知用户要么保持空闲或选择一个包含C(C <n)连续通道的段。然后,将汇总选定部分中的空置通道以满足用户要求。用户收到一个二进制反馈信号,指示每个传输后传输是否成功(即ACK信号),并根据传感频道状态做出下一个决定。在这里,我们旨在找到一项可以最大程度地提高成功传输数量而不会中断主要用户(PU)的策略。由于没有对系统环境的全面观察,该问题可以被视为部分可观察到的马尔可夫决策过程(POMDP)。我们实施了深层Q网络(DQN),以应对未知系统动态和计算费用的挑战。通过模拟评估了DQN,Q学习和具有已知系统动力学的Ixpistent策略的性能。模拟结果表明,DQN只能基于部分观察和ACK信号在不同系统方案之间实现近乎最佳的性能。
In this paper, the problem of dynamic spectrum sensing and aggregation is investigated in a wireless network containing N correlated channels, where these channels are occupied or vacant following an unknown joint 2-state Markov model. At each time slot, a single cognitive user with certain bandwidth requirement either stays idle or selects a segment comprising C (C < N) contiguous channels to sense. Then, the vacant channels in the selected segment will be aggregated for satisfying the user requirement. The user receives a binary feedback signal indicating whether the transmission is successful or not (i.e., ACK signal) after each transmission, and makes next decision based on the sensing channel states. Here, we aim to find a policy that can maximize the number of successful transmissions without interrupting the primary users (PUs). The problem can be considered as a partially observable Markov decision process (POMDP) due to without full observation of system environment. We implement a Deep Q-Network (DQN) to address the challenge of unknown system dynamics and computational expenses. The performance of DQN, Q-Learning, and the Improvident Policy with known system dynamics is evaluated through simulations. The simulation results show that DQN can achieve near-optimal performance among different system scenarios only based on partial observations and ACK signals.