安全的Waseserstein限制了深度Q学习

论文标题

安全的Waseserstein限制了深度Q学习

Safe Wasserstein Constrained Deep Q-Learning

论文作者

Kandel, Aaron, Moura, Scott J.

论文摘要

本文提出了一种在线学习期间提供理想主义的概率概率的样本外安全保证，介绍了稳固的Q学习算法（DRQ），在线学习期间提供了理想主义的概率概率。首先，我们通过将约束函数与主要目标分开，以创建机器的层次结构，以估算约束马尔可夫决策过程（CMDP）中可行的状态行动空间。 DRQ通过通过Wasserstein分配强大的优化（DRO）获得的收紧偏移变量来增加约束成本，从而在此框架内工作。这些偏移变量对应于建模误差的最坏情况分布，其特征在于约束Q-函数的TD误差。此过程使我们能够安全地接近名义约束边界。使用锂离子电池快速充电的案例研究，我们探讨了理想主义的安全性如何转化为相对于常规方法的总体安全性。

This paper presents a distributionally robust Q-Learning algorithm (DrQ) which leverages Wasserstein ambiguity sets to provide idealistic probabilistic out-of-sample safety guarantees during online learning. First, we follow past work by separating the constraint functions from the principal objective to create a hierarchy of machines which estimate the feasible state-action space within the constrained Markov decision process (CMDP). DrQ works within this framework by augmenting constraint costs with tightening offset variables obtained through Wasserstein distributionally robust optimization (DRO). These offset variables correspond to worst-case distributions of modeling error characterized by the TD-errors of the constraint Q-functions. This procedure allows us to safely approach the nominal constraint boundaries. Using a case study of lithium-ion battery fast charging, we explore how idealistic safety guarantees translate to generally improved safety relative to conventional methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题