论文标题
稳定保证的演员批判性强化学习以控制控制
Actor-Critic Reinforcement Learning for Control with Stability Guarantee
论文作者
论文摘要
强化学习(RL)及其与深度学习的整合在各种机器人控制任务中取得了令人印象深刻的表现,从运动计划和导航到端到端的视觉操作。但是,仅通过使用数据来保证在无模型RL中稳定稳定性。从控制理论的角度来看,稳定性是任何控制系统的最重要属性,因为它与机器人系统的安全性,鲁棒性和可靠性密切相关。在本文中,我们提出了一个参与者的控制框架,以通过在控制理论中采用经典的Lyapunov方法来保证闭环稳定性。首先,针对由马尔可夫决策过程建模的随机非线性系统提出了基于数据的稳定定理。然后,我们表明,可以将稳定性条件作为批评者在参与者批评的RL中学习,以学习控制者/政策。最后,在三个不同流行物理模拟平台中的几个著名的三维机器人控制任务和合成生物学网络跟踪任务中评估了我们方法的有效性。作为对稳定优势的经验评估,我们表明,当受到不确定性(例如系统参数变化和外部干扰)干预时,学到的政策可以使系统能够恢复到平衡或途径。
Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, since it is closely related to safety, robustness, and reliability of robotic systems. In this paper, we propose an actor-critic RL framework for control which can guarantee closed-loop stability by employing the classic Lyapunov's method in control theory. First of all, a data-based stability theorem is proposed for stochastic nonlinear systems modeled by Markov decision process. Then we show that the stability condition could be exploited as the critic in the actor-critic RL to learn a controller/policy. At last, the effectiveness of our approach is evaluated on several well-known 3-dimensional robot control tasks and a synthetic biology gene network tracking task in three different popular physics simulation platforms. As an empirical evaluation on the advantage of stability, we show that the learned policies can enable the systems to recover to the equilibrium or way-points when interfered by uncertainties such as system parametric variations and external disturbances to a certain extent.