论文标题
具有闭环稳定性保证的增强学习方法
A reinforcement learning method with closed-loop stability guarantee
论文作者
论文摘要
在控制系统的背景下,强化学习(RL)提供了控制器适应的广泛可能性。鉴于无限 - 摩尼克的成本功能,所谓的RL批评家将其与神经网近似,并将此信息发送给控制器(称为“演员”)。但是,在RL方法下的闭环稳定性问题仍未完全解决。由于评论家仅提供了相应的无限 - 马问题的价值函数的近似值,因此总体上不能保证演员的行为是否稳定系统。存在不同的方法。当前的工作提供了一项特定的作品,从(不一定是平滑)控制Lyapunov功能(CLF)开始,它以在线RL-Scheme得出了这样的方式,即可以建立闭环的实际半全球稳定性属性。从逻辑上讲,该方法继续作者对RL的参数化控制器和类似Lyapunov的约束的工作,而CLF现在仅出现在控制方案的约束之一中。闭环行为的分析是以样本和含量(SH)方式进行的,从而对数字实现提供了一定的见解。与非全面集成商的案例研究显示了与标称稳定控制器相比,派生方法优化给定成本函数的能力。
Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinite-horizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.