论文标题
通过基于模型的深钢筋学习对电压控制策略的有效学习
Efficient Learning of Voltage Control Strategies via Model-based Deep Reinforcement Learning
论文作者
论文摘要
本文提出了一种基于模型的深入增强学习(DRL)方法,以设计电源系统中短期电压稳定性问题的紧急控制策略。最近的进步显示了基于无模型DRL的电力系统方法的有希望的结果,但是无模型的方法的样本效率和培训时间差,这对于使最先进的DRL算法实际上适用至关重要。 Drl-Agent在与现实世界环境互动时通过反复试验方法学习了最佳政策。并且希望将DRL试剂与现实世界电网的直接相互作用最小化,这是由于其安全性至关重要的性质。此外,基于DRL的最先进的策略主要是使用基于物理的网格模拟器进行培训的,该电网模拟器在计算上是计算密集的,从而降低了训练效率。我们提出了一个基于模型的新型DRL框架,其中深层神经网络(DNN)基于动态替代模型,而不是现实世界中的动力网络或基于物理的模拟,可用于策略学习框架,从而使过程更快且效率。但是,由于大规模动力系统的复杂系统动力学,稳定基于模型的DRL具有挑战性。我们通过结合模仿学习来解决这些问题,以在政策学习,奖励形成和多步代替代损失方面有一个温暖的开端。最后,我们实现了97.5%的样品效率和87.7%的培训效率,用于IEEE 300-BUS测试系统。
This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system.