具有闭环稳定性保证的增强学习方法

论文标题

具有闭环稳定性保证的增强学习方法

A reinforcement learning method with closed-loop stability guarantee

论文作者

Osinenko, Pavel, Beckenbach, Lukas, Göhrt, Thomas, Streif, Stefan

论文摘要

在控制系统的背景下，强化学习（RL）提供了控制器适应的广泛可能性。鉴于无限 - 摩尼克的成本功能，所谓的RL批评家将其与神经网近似，并将此信息发送给控制器（称为“演员”）。但是，在RL方法下的闭环稳定性问题仍未完全解决。由于评论家仅提供了相应的无限 - 马问题的价值函数的近似值，因此总体上不能保证演员的行为是否稳定系统。存在不同的方法。当前的工作提供了一项特定的作品，从（不一定是平滑）控制Lyapunov功能（CLF）开始，它以在线RL-Scheme得出了这样的方式，即可以建立闭环的实际半全球稳定性属性。从逻辑上讲，该方法继续作者对RL的参数化控制器和类似Lyapunov的约束的工作，而CLF现在仅出现在控制方案的约束之一中。闭环行为的分析是以样本和含量（SH）方式进行的，从而对数字实现提供了一定的见解。与非全面集成商的案例研究显示了与标称稳定控制器相比，派生方法优化给定成本函数的能力。

Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinite-horizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.

下载PDF全文

下载文献需遵守相关版权规定

论文标题