论文标题
$ f $ -gail:学习$ f $ -Divergence用于生成对抗性模仿学习
$f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation Learning
论文作者
论文摘要
模仿学习(IL)旨在从专家示范中学习一项政策,以最大程度地减少学习者与专家行为之间的差异。已经提出了各种模仿学习算法,具有不同的预定差异,以量化差异。这自然会引起以下问题:给定一系列专家演示,哪种差异可以通过更高的数据效率更准确地恢复专家政策?在这项工作中,我们提出了$ f $ - gail,这是一种新的生成对抗性模仿学习(GAIL)模型,该模型会自动从$ f $ divergence家族以及能够产生类似专家的行为的政策中学习差异措施。与具有各种预定义差异措施的IL基准相比,$ f $ - gail在六个基于物理的控制任务中学习具有更高数据效率的更好的政策。
Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.