参考跟踪和干扰拒绝问题的稳态错误补偿基于加强学习控制

论文标题

参考跟踪和干扰拒绝问题的稳态错误补偿基于加强学习控制

Steady-State Error Compensation in Reference Tracking and Disturbance Rejection Problems for Reinforcement Learning-Based Control

论文作者

Weber, Daniel, Schenke, Maximilian, Wallscheid, Oliver

论文摘要

强化学习（RL）是自动控制应用程序中的一个有希望的，即将到来的主题。如果经典控制方法需要先验系统知识，则数据驱动的控制方法（例如RL）允许使用无模型的控制器设计程序，从而为随着植物结构和变化参数而变化的系统提供新兴技术。尽管在各种应用中已经显示出复杂系统的瞬态控制行为可以通过RL充分处理，但不存在稳态控制误差的挑战仍然存在，这是由于控制策略近似值和有限的培训时间的使用而引起的。为了克服这个问题，介绍了基于参与者的RL控制器的整体行动状态增强（IASA），该问题是模仿集成反馈的，该反馈的灵感来自模型预测性控制中的Delta输入公式。这种增强不需要任何专家知识，而是免费的。结果，RL控制器学习如何更有效地抑制稳态控制偏差。电能工程领域的两个示例应用验证了开发方法的好处，既可以参考跟踪和扰动排斥。与标准的深层确定性策略梯度（DDPG）设置相比，建议的IASA扩展可以在被考虑的验证方案中减少稳态错误高达52 $ \％$。

Reinforcement learning (RL) is a promising, upcoming topic in automatic control applications. Where classical control approaches require a priori system knowledge, data-driven control approaches like RL allow a model-free controller design procedure, rendering them emergent techniques for systems with changing plant structures and varying parameters. While it was already shown in various applications that the transient control behavior for complex systems can be sufficiently handled by RL, the challenge of non-vanishing steady-state control errors remains, which arises from the usage of control policy approximations and finite training times. To overcome this issue, an integral action state augmentation (IASA) for actor-critic-based RL controllers is introduced that mimics an integrating feedback, which is inspired by the delta-input formulation within model predictive control. This augmentation does not require any expert knowledge, leaving the approach model free. As a result, the RL controller learns how to suppress steady-state control deviations much more effectively. Two exemplary applications from the domain of electrical energy engineering validate the benefit of the developed method both for reference tracking and disturbance rejection. In comparison to a standard deep deterministic policy gradient (DDPG) setup, the suggested IASA extension allows to reduce the steady-state error by up to 52 $\%$ within the considered validation scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题