强化学习控制受约束动态系统具有统一最终界限稳定性保证的控制

论文标题

强化学习控制受约束动态系统具有统一最终界限稳定性保证的控制

Reinforcement Learning Control of Constrained Dynamic Systems with Uniformly Ultimate Boundedness Stability Guarantee

论文作者

Han, Minghao, Tian, Yuan, Zhang, Lixian, Wang, Jun, Pan, Wei

论文摘要

加强学习（RL）对于复杂的随机非线性控制问题是有希望的。在不使用数学模型的情况下，可以通过通过反复试验评估的数据从数据评估的数据中学到最佳控制器。但是，基于数据的学习方法臭名昭著，因为它不保证稳定性，这是任何控制系统的最基本属性。在本文中，探索了经典的Lyapunov方法，以分析仅基于数据而无需使用数学模型的数据统一的最终界限稳定性（UUB）。进一步显示了如何将带有UUB保证的RL应用于具有安全限制的控制动态系统。基于理论结果，分别提出了非政策和上政策学习算法。结果，可以学习最佳控制器，以确保融合和学习期间的闭环系统的UUB。在一系列具有安全限制的机器人连续控制任务上评估了所提出的算法。与现有的RL算法相比，所提出的方法可以在保持安全性方面取得卓越的性能。作为对稳定性的定性评估，即使存在外部干扰，我们的方法也会显示出令人印象深刻的韧性。

Reinforcement learning (RL) is promising for complicated stochastic nonlinear control problems. Without using a mathematical model, an optimal controller can be learned from data evaluated by certain performance criteria through trial-and-error. However, the data-based learning approach is notorious for not guaranteeing stability, which is the most fundamental property for any control system. In this paper, the classic Lyapunov's method is explored to analyze the uniformly ultimate boundedness stability (UUB) solely based on data without using a mathematical model. It is further shown how RL with UUB guarantee can be applied to control dynamic systems with safety constraints. Based on the theoretical results, both off-policy and on-policy learning algorithms are proposed respectively. As a result, optimal controllers can be learned to guarantee UUB of the closed-loop system both at convergence and during learning. The proposed algorithms are evaluated on a series of robotic continuous control tasks with safety constraints. In comparison with the existing RL algorithms, the proposed method can achieve superior performance in terms of maintaining safety. As a qualitative evaluation of stability, our method shows impressive resilience even in the presence of external disturbances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题