通过探索模仿未知政策

论文标题

通过探索模仿未知政策

Imitating Unknown Policies via Exploration

论文作者

Gavenski, Nathan, Monteiro, Juarez, Granada, Roger, Meneguzzi, Felipe, Barros, Rodrigo C.

论文摘要

行为克隆是一种模仿学习技术，它教授代理如何通过专家演示行为。最近的方法使用了各州无标记的快照对状态对的态度的自我划分。但是，从这些技术中的迭代学习方案很容易陷入不良的本地最小值。我们解决了这些限制，该局限性将两阶段模型纳入原始框架，该框架通过探索从未标记的观察结果中学习，通过利用（i）一种采样机制来大大改善传统的行为克隆（i），以防止局部微型少量，（ii）提高勘探的抽样机制，以改善探索，以及（iii）自我意见模块以捕获全球特征。最终的技术优于以前在四个不同环境中的先前最新距离。

Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题