通过基于模型的深钢筋学习对电压控制策略的有效学习

论文标题

通过基于模型的深钢筋学习对电压控制策略的有效学习

Efficient Learning of Voltage Control Strategies via Model-based Deep Reinforcement Learning

论文作者

Hossain, Ramij R., Yin, Tianzhixi, Du, Yan, Huang, Renke, Tan, Jie, Yu, Wenhao, Liu, Yuan, Huang, Qiuhua

论文摘要

本文提出了一种基于模型的深入增强学习（DRL）方法，以设计电源系统中短期电压稳定性问题的紧急控制策略。最近的进步显示了基于无模型DRL的电力系统方法的有希望的结果，但是无模型的方法的样本效率和培训时间差，这对于使最先进的DRL算法实际上适用至关重要。 Drl-Agent在与现实世界环境互动时通过反复试验方法学习了最佳政策。并且希望将DRL试剂与现实世界电网的直接相互作用最小化，这是由于其安全性至关重要的性质。此外，基于DRL的最先进的策略主要是使用基于物理的网格模拟器进行培训的，该电网模拟器在计算上是计算密集的，从而降低了训练效率。我们提出了一个基于模型的新型DRL框架，其中深层神经网络（DNN）基于动态替代模型，而不是现实世界中的动力网络或基于物理的模拟，可用于策略学习框架，从而使过程更快且效率。但是，由于大规模动力系统的复杂系统动力学，稳定基于模型的DRL具有挑战性。我们通过结合模仿学习来解决这些问题，以在政策学习，奖励形成和多步代替代损失方面有一个温暖的开端。最后，我们实现了97.5％的样品效率和87.7％的培训效率，用于IEEE 300-BUS测试系统。

This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题