自动编码对抗性模仿学习

论文标题

自动编码对抗性模仿学习

Auto-Encoding Adversarial Imitation Learning

论文作者

Zhang, Kaifeng, Zhao, Rui, Zhang, Ziming, Gao, Yang

论文摘要

加强学习（RL）为决策提供了一个有力的框架，但是其在实践中的应用通常需要精心设计的奖励功能。对抗性模仿学习（AIL）阐明了自动策略获取，而无需从环境中访问奖励信号。在这项工作中，我们提出了自动编码对抗性模仿学习（AEAIL），这是一个可靠且可扩展的AIL框架。为了从演示中引起专家策略，AEAIL利用自动编码器的重建误差作为奖励信号，该奖励信号比以前的基于歧视者提供了更多以优化策略的信息。随后，我们使用派生的目标函数来训练自动编码器和代理策略。实验表明，与基于状态和基于图像的环境上的最新方法相比，我们的AEAIL表现优越。更重要的是，当专家演示嘈杂时，AEAIL表现出更好的鲁棒性。

Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题