通过正则状态占用匹配的观察和示例的多功能离线模仿

论文标题

通过正则状态占用匹配的观察和示例的多功能离线模仿

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

论文作者

Ma, Yecheng Jason, Shen, Andrew, Jayaraman, Dinesh, Bastani, Osbert

论文摘要

我们提出了状态匹配的离线分布校正估计（SMODICE），这是一种新颖且基于多功能回归的离线模仿学习（IL）算法，该算法是通过状态占用匹配得出的。我们表明，Smodice目标通过在表格MDP中的Fenchel二元性和分析解决方案的应用来允许一个简单的优化过程。在不需要访问专家行动的情况下，Smodice可以有效地应用于三个离线IL设置：（i）来自观测值（IFO），（ii）具有动态或形态上不匹配的专家的IFO的模仿，以及（iii）基于示例的增强性学习，我们证明可以将其表明为州职业能力匹配的问题。我们在网格世界环境以及高维离线基准上广泛评估了Smodice。我们的结果表明，Smodice对于所有三个问题设置都有效，并且在前最新情况下均明显胜过。

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning (IL) algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题