$ f $ -gail：学习$ f $ -Divergence用于生成对抗性模仿学习

论文标题

$ f $ -gail：学习$ f $ -Divergence用于生成对抗性模仿学习

$f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation Learning

论文作者

Zhang, Xin, Li, Yanhua, Zhang, Ziming, Zhang, Zhi-Li

论文摘要

模仿学习（IL）旨在从专家示范中学习一项政策，以最大程度地减少学习者与专家行为之间的差异。已经提出了各种模仿学习算法，具有不同的预定差异，以量化差异。这自然会引起以下问题：给定一系列专家演示，哪种差异可以通过更高的数据效率更准确地恢复专家政策？在这项工作中，我们提出了$ f $ - gail，这是一种新的生成对抗性模仿学习（GAIL）模型，该模型会自动从$ f $ divergence家族以及能够产生类似专家的行为的政策中学习差异措施。与具有各种预定义差异措施的IL基准相比，$ f $ - gail在六个基于物理的控制任务中学习具有更高数据效率的更好的政策。

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题