论文标题
在线,稳定强化学习的框架
A framework for online, stabilizing reinforcement learning
论文作者
论文摘要
在线加强学习与通过与环境的动态互动在线培训代理商有关。在这里,由于应用程序的细节,通常不可能执行长期的预训练,因为通常在离线的,无模型的方法中进行的,类似于动态编程。在行业中,可以更频繁地发现此类应用程序,而不是在纯数字领域(例如云服务,视频游戏,数据库管理等)中发现的应用程序,在该领域中,强化学习一直在证明成功。相比之下,在线加强学习更类似于古典控制,它利用了一些有关环境的模型知识。闭环的稳定性(代理和环境)是此类在线方法的主要挑战。在本文中,我们通过在线加强学习与古典控制元素的特殊融合来解决这个问题,即基于Lyapunov的稳定理论。这个想法是立即启动代理,而无需预先培训,并在专门设计的约束下学习大致最佳政策,从而确保稳定性。通过移动机器人进行了广泛的实验研究,对所得的方法进行了测试。名义停车控制器用作基线。据观察,建议的代理总是可以成功地停放机器人,同时大大提高了成本。尽管许多方法可能被利用用于移动机器人控制,但我们建议实验表明了基于Lyapunov的约束,在线加强学习剂的潜力。提出的方法可以用于需要稳定性的安全至关重要的工业应用中。
Online reinforcement learning is concerned with training an agent on-the-fly via dynamic interaction with the environment. Here, due to the specifics of the application, it is not generally possible to perform long pre-training, as it is commonly done in off-line, model-free approaches, which are akin to dynamic programming. Such applications may be found more frequently in industry, rather than in pure digital fields, such as cloud services, video games, database management, etc., where reinforcement learning has been demonstrating success. Online reinforcement learning, in contrast, is more akin to classical control, which utilizes some model knowledge about the environment. Stability of the closed-loop (agent plus the environment) is a major challenge for such online approaches. In this paper, we tackle this problem by a special fusion of online reinforcement learning with elements of classical control, namely, based on the Lyapunov theory of stability. The idea is to start the agent at once, without pre-training, and learn approximately optimal policy under specially designed constraints, which guarantee stability. The resulting approach was tested in an extensive experimental study with a mobile robot. A nominal parking controller was used as a baseline. It was observed that the suggested agent could always successfully park the robot, while significantly improving the cost. While many approaches may be exploited for mobile robot control, we suggest that the experiments showed the promising potential of online reinforcement learning agents based on Lyapunov-like constraints. The presented methodology may be utilized in safety-critical, industrial applications where stability is necessary.