通过想象不久的将来，安全的加强学习

论文标题

通过想象不久的将来，安全的加强学习

Safe Reinforcement Learning by Imagining the Near Future

论文作者

Thomas, Garrett, Luo, Yuping, Ma, Tengyu

论文摘要

安全的加强学习是将强化学习算法应用于现实世界问题的有前途的途径，在这种问题上，次优行为可能导致实际的负面后果。在这项工作中，我们专注于可以通过在短时间内计划未来的时间来避免不安全状态的环境。在这种情况下，具有足够精确模型的基于模型的代理可以避免不安全状态。我们设计了一种基于模型的算法，该算法严重惩罚了不安全的轨迹，并确保我们的算法可以避免在某些假设下避免不安全状态。实验表明，我们的算法可以在几个连续的控制任务中获得较少的安全性侵犯，从而获得竞争性奖励。

Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences. In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. In this setting, a model-based agent with a sufficiently accurate model can avoid unsafe states. We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions. Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题