论文标题
概括保证模仿学习
Generalization Guarantees for Imitation Learning
论文作者
论文摘要
由于不完善的演示或模仿学习算法无法准确推断专家的政策,因此,模仿学习的控制政策通常无法推广到新的环境。在本文中,我们通过利用大约正确的(PAC) - 贝斯框架来提供关于新型环境中政策预期成本的上限来提供模仿学习的严格概括。我们提出了一种两阶段训练方法,该方法首先使用条件变分自动编码器将潜在的策略分布与多模式专家行为嵌入,然后在新的培训环境中“微调”,以明确优化概括性结合。我们证明了(i)抓住各种杯子的模拟中相对于经验性能的强烈概括界限及其紧密度,(ii)通过视觉反馈推动的平面以及(iii)基于视觉的室内导航,以及通过针对两个操纵任务的硬件实验。
Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the expert's policies. In this paper, we present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework to provide upper bounds on the expected cost of policies in novel environments. We propose a two-stage training method where a latent policy distribution is first embedded with multi-modal expert behavior using a conditional variational autoencoder, and then "fine-tuned" in new training environments to explicitly optimize the generalization bound. We demonstrate strong generalization bounds and their tightness relative to empirical performance in simulation for (i) grasping diverse mugs, (ii) planar pushing with visual feedback, and (iii) vision-based indoor navigation, as well as through hardware experiments for the two manipulation tasks.