使用深Q-Networks数字辅助风险感知的睡眠模式管理

论文标题

使用深Q-Networks数字辅助风险感知的睡眠模式管理

Digital Twin Assisted Risk-Aware Sleep Mode Management Using Deep Q-Networks

论文作者

Masoudi, Meysam, Soroush, Ebrahim, Zander, Jens, Cavdar, Cicek

论文摘要

基站（BSS）是移动网络最耗能的部分。为了减少BS能耗，当BS不活跃时，BSS的不同组件可以入睡。根据BS组件的激活/失活时间，文献中定义了多种睡眠模式（SMS）。在这项研究中，我们对BS节能采用多种睡眠模式作为连续的MDP进行了建模，并提出了一种在线交通感知的深入强化学习方法，以最大程度地节省了长期的能源。但是，BS可能在适当的时间睡觉，并会给用户带来很大的延误。为了解决此问题，我们建议使用数字双胞胎模型封装研究系统的动态，并提前估算决策风险（RDM）。我们定义了一个新的度量，以量化RDM并预测性能降解。将由DT计算的RDM与移动操作员设置的可容忍的阈值进行了比较。基于此比较，BS可以决定停用SMS，在需要时重新培训以避免承担高风险，或激活SMS以从节能中受益。为了深入的增强学习，我们使用长期术语记忆（LSTM）考虑输入流量的漫长和短期依赖性，并近似Q功能。我们使用经验重播方法训练LSTM网络，这是通过从斯德哥尔摩运营商BS获得的真实流量数据集训练。数据集包含具有非常粗糙的时间粒度的数据速率信息。因此，我们提出了一种使用实际网络数据集生成新数据集的方案，该数据集1）具有更细粒度的时间粒度和2）考虑流量数据的爆发行为。模拟结果表明，与基线相比，使用建议的方法可获得大量节能，而延迟用户数量可忽略不计。

Base stations (BSs) are the most energy-consuming segment of mobile networks. To reduce BS energy consumption, different components of BSs can sleep when BS is not active. According to the activation/deactivation time of the BS components, multiple sleep modes (SMs) are defined in the literature. In this study, we model the problem of BS energy saving utilizing multiple sleep modes as a sequential MDP and propose an online traffic-aware deep reinforcement learning approach to maximize the long-term energy saving. However, there is a risk that BS is not sleeping at the right time and incurs large delays to the users. To tackle this issue, we propose to use a digital twin model to encapsulate the dynamics underlying the investigated system and estimate the risk of decision-making (RDM) in advance. We define a novel metric to quantify RDM and predict the performance degradation. The RDM calculated by DT is compared with a tolerable threshold set by the mobile operator. Based on this comparison, BS can decide to deactivate the SMs, re-train when needed to avoid taking high risks, or activate the SMs to benefit from energy savings. For deep reinforcement learning, we use long-short term memory (LSTM), to take into account the long and short-term dependencies in input traffic, and approximate the Q-function. We train the LSTM network using the experience replay method over a real traffic data set obtained from an operator BS in Stockholm. The data set contains data rate information with very coarse-grained time granularity. Thus, we propose a scheme to generate a new data set using the real network data set which 1) has finer-grained time granularity and 2) considers the bursty behavior of traffic data. Simulation results show that using proposed methods, considerable energy saving is obtained, compared to the baselines at cost of negligible number of delayed users.

下载PDF全文

下载文献需遵守相关版权规定

论文标题