论文标题

通过探索模仿未知政策

Imitating Unknown Policies via Exploration

论文作者

Gavenski, Nathan, Monteiro, Juarez, Granada, Roger, Meneguzzi, Felipe, Barros, Rodrigo C.

论文摘要

行为克隆是一种模仿学习技术,它教授代理如何通过专家演示行为。最近的方法使用了各州无标记的快照对状态对的态度的自我划分。但是,从这些技术中的迭代学习方案很容易陷入不良的本地最小值。我们解决了这些限制,该局限性将两阶段模型纳入原始框架,该框架通过探索从未标记的观察结果中学习,通过利用(i)一种采样机制来大大改善传统的行为克隆(i),以防止局部微型少量,(ii)提高勘探的抽样机制,以改善探索,以及(iii)自我意见模块以捕获全球特征。最终的技术优于以前在四个不同环境中的先前最新距离。

Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源