论文标题
通过Infomax终止评论家学习多种选择
Learning Diverse Options via InfoMax Termination Critic
论文作者
论文摘要
我们考虑在加强学习中自主学习可重复使用的时间扩展的动作或选项的问题。虽然选项可以通过充当可重复使用的构件来加快转移学习的速度,但学习可重复使用的任务分配的可重复使用的选项仍然具有挑战性。由于近期基于共同信息(MI)的技能学习的成功,我们假设更多样化的选择更可重复使用。为此,我们提出了一种通过最大化选项和相应状态过渡之间的MI来学习期权终止条件的方法。我们通过梯度提升得出了该MI最大化的可扩展近似,从而得出了InfoMax终止评论家(IMTC)算法。我们的实验表明,IMTC显着改善了学习期权的多样性,而没有外部奖励与内在期权学习方法相结合。此外,我们通过将选项转移到各种任务中来测试学习选项的可重复性,证实IMTC有助于快速适应,尤其是在代理需要操纵对象的复杂域中。
We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning. While options can speed up transfer learning by serving as reusable building blocks, learning reusable options for unknown task distribution remains challenging. Motivated by the recent success of mutual information (MI) based skill learning, we hypothesize that more diverse options are more reusable. To this end, we propose a method for learning termination conditions of options by maximizing MI between options and corresponding state transitions. We derive a scalable approximation of this MI maximization via gradient ascent, yielding the InfoMax Termination Critic (IMTC) algorithm. Our experiments demonstrate that IMTC significantly improves the diversity of learned options without extrinsic rewards combined with an intrinsic option learning method. Moreover, we test the reusability of learned options by transferring options into various tasks, confirming that IMTC helps quick adaptation, especially in complex domains where an agent needs to manipulate objects.