LCRL：通过逻辑约束的强化学习认证的策略综合

论文标题

LCRL：通过逻辑约束的强化学习认证的策略综合

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

论文作者

Hasanbeig, Hosein, Kroening, Daniel, Abate, Alessandro

论文摘要

LCRL是一种软件工具，可在未知的马尔可夫决策过程（MDPS）上实现无模型加固学习（RL）算法，综合策略，以满足给定的线性时间规范具有最大概率。 LCRL利用被称为极限确定性Buchi Automata（LDBA）的部分确定性有限状态机器表达给定的线性时间规范。 RL算法的奖励功能是根据LDBA的结构即时形成的。理论保证在适当的假设下确保RL算法与最大化满意度概率的最佳策略的收敛性。我们提出了案例研究，以证明LCRL的适用性，易用性，可伸缩性和性能。由于LDBA引导的探索和无LCRL模型架构，我们观察到了稳健的性能，与标准RL方法相比（每当适用于LTL规格时）时，它也可以很好地扩展。有关如何执行本文中所有案例研究的完整说明，请在lcrl分发www.github.com/grockious/lcrl的GitHub页面上提供。

LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题