在超密集网络中的动态流行下，基于增强学习的合作编码缓存

论文标题

在超密集网络中的动态流行下，基于增强学习的合作编码缓存

Reinforcement Learning Based Cooperative Coded Caching under Dynamic Popularities in Ultra-Dense Networks

论文作者

Gao, Shen, Dong, Peihao, Pan, Zhiwen, Li, Geoffrey Ye

论文摘要

对于具有无线回程的超密集网络，小型基站（SBSS）的缓存策略通常对于满足大量高数据速率请求至关重要。由于内容受欢迎程度的概况随时间而异，因此我们利用强化学习（RL）来设计具有最大距离可分离（MDS）编码的合作缓存策略。我们将基于MDS编码的合作缓存建模为Markov决策过程，以捕获流行度动态，并最大程度地提高SBSS直接服务的长期预期累积交通负荷而无需访问宏基础站。对于公式化的问题，我们首先通过将合作的MDS编码嵌入Q学习来找到小规模系统的最佳解决方案。为了应对大规模的情况，我们以启发性近似状态行动值函数。近似函数仅包含少数可学习的参数，使我们能够提出一种快速有效的动作选择方法，从而大大降低了复杂性。数值结果验证了所提出的基于RL的算法的最优性/近距离，并显示了与基线方案相比的优势。它们还对不同的环境表现出良好的鲁棒性。

For ultra-dense networks with wireless backhaul, caching strategy at small base stations (SBSs), usually with limited storage, is critical to meet massive high data rate requests. Since the content popularity profile varies with time in an unknown way, we exploit reinforcement learning (RL) to design a cooperative caching strategy with maximum-distance separable (MDS) coding. We model the MDS coding based cooperative caching as a Markov decision process to capture the popularity dynamics and maximize the long-term expected cumulative traffic load served directly by the SBSs without accessing the macro base station. For the formulated problem, we first find the optimal solution for a small-scale system by embedding the cooperative MDS coding into Q-learning. To cope with the large-scale case, we approximate the state-action value function heuristically. The approximated function includes only a small number of learnable parameters and enables us to propose a fast and efficient action-selection approach, which dramatically reduces the complexity. Numerical results verify the optimality/near-optimality of the proposed RL based algorithms and show the superiority compared with the baseline schemes. They also exhibit good robustness to different environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题