论文标题

具有闭环稳定性保证的增强学习方法

A reinforcement learning method with closed-loop stability guarantee

论文作者

Osinenko, Pavel, Beckenbach, Lukas, Göhrt, Thomas, Streif, Stefan

论文摘要

在控制系统的背景下,强化学习(RL)提供了控制器适应的广泛可能性。鉴于无限 - 摩尼克的成本功能,所谓的RL批评家将其与神经网近似,并将此信息发送给控制器(称为“演员”)。但是,在RL方法下的闭环稳定性问题仍未完全解决。由于评论家仅提供了相应的无限 - 马问题的价值函数的近似值,因此总体上不能保证演员的行为是否稳定系统。存在不同的方法。当前的工作提供了一项特定的作品,从(不一定是平滑)控制Lyapunov功能(CLF)开始,它以在线RL-Scheme得出了这样的方式,即可以建立闭环的实际半全球稳定性属性。从逻辑上讲,该方法继续作者对RL的参数化控制器和类似Lyapunov的约束的工作,而CLF现在仅出现在控制方案的约束之一中。闭环行为的分析是以样本和含量(SH)方式进行的,从而对数字实现提供了一定的见解。与非全面集成商的案例研究显示了与标称稳定控制器相比,派生方法优化给定成本函数的能力。

Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinite-horizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源