通过基于信心的过滤器进行安全加固学习

论文标题

通过基于信心的过滤器进行安全加固学习

Safe Reinforcement Learning via Confidence-Based Filters

论文作者

Curi, Sebastian, Lederer, Armin, Hirche, Sandra, Krause, Andreas

论文摘要

在将强化学习（RL）部署到现实世界系统中时，确保安全是一个至关重要的挑战。我们开发了基于置信的安全过滤器，这是一种基于概率动力学模型的标准RL技术，通过标准RL技术学到的名义策略来证明国家安全限制的控制理论方法。我们的方法基于对成本功能的国家约束的重新重新制定，从而将安全验证减少到标准RL任务。通过利用幻觉输入的概念，我们扩展了此公式，以确定具有很高可能性的未知系统安全的“备份”策略。最后，在推出备用政策期间的每一个时间步骤中，标称政策的调整是最小的，以便以后可以保证安全恢复。我们提供正式的安全保证，并从经验上证明我们方法的有效性。

Ensuring safety is a crucial challenge when deploying reinforcement learning (RL) to real-world systems. We develop confidence-based safety filters, a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard RL techniques, based on probabilistic dynamics models. Our approach is based on a reformulation of state constraints in terms of cost functions, reducing safety verification to a standard RL task. By exploiting the concept of hallucinating inputs, we extend this formulation to determine a "backup" policy that is safe for the unknown system with high probability. Finally, the nominal policy is minimally adjusted at every time step during a roll-out towards the backup policy, such that safe recovery can be guaranteed afterwards. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题