基于模型的强化从信号时间逻辑规范中学习

论文标题

基于模型的强化从信号时间逻辑规范中学习

Model-based Reinforcement Learning from Signal Temporal Logic Specifications

论文作者

Kapoor, Parv, Balakrishnan, Anand, Deshmukh, Jyotirmoy V.

论文摘要

基于强化学习（RL）的技术越来越多地用于设计机器人系统的控制政策。 RL从根本上依靠基于州的奖励功能来编码机器人的所需行为，而不良奖励功能很容易受到学习代理的剥削，从而导致行为在最佳情况下是不受欢迎的，并且在最坏情况下非常危险。另一方面，为复杂任务设计良好的奖励功能是一个具有挑战性的问题。在本文中，我们建议使用称为信号时间逻辑（STL）的正式规范语言来表达所需的高级机器人行为，作为奖励/成本功能的替代方案。我们将STL规范与基于模型的学习结合使用来设计模型预测控制器，以优化在有限时间范围内STL规范的满意度。拟议的算法对机器人系统的模拟进行了经验评估，例如拾音器机器人组，以及自动驾驶汽车的自适应巡航控制。

Techniques based on Reinforcement Learning (RL) are increasingly being used to design control policies for robotic systems. RL fundamentally relies on state-based reward functions to encode desired behavior of the robot and bad reward functions are prone to exploitation by the learning agent, leading to behavior that is undesirable in the best case and critically dangerous in the worst. On the other hand, designing good reward functions for complex tasks is a challenging problem. In this paper, we propose expressing desired high-level robot behavior using a formal specification language known as Signal Temporal Logic (STL) as an alternative to reward/cost functions. We use STL specifications in conjunction with model-based learning to design model predictive controllers that try to optimize the satisfaction of the STL specification over a finite time horizon. The proposed algorithm is empirically evaluated on simulations of robotic system such as a pick-and-place robotic arm, and adaptive cruise control for autonomous vehicles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题