论文标题
探索和转移的潜在技能计划
Latent Skill Planning for Exploration and Transfer
论文作者
论文摘要
为了快速解决复杂环境中的新任务,智能代理需要建立可重复使用的知识。例如,一个博学的世界模型捕获了适用于新任务的环境的知识。同样,技能捕获可以适用于新任务的一般行为。在本文中,我们研究了如何将这两种方法集成到单个增强学习剂中。具体而言,我们利用部分摊销的想法在测试时进行快速适应。为此,采取行动是由随着时间的流逝而学习的政策制定的,而条件的技能则是使用在线计划选择的。与竞争性基线相比,我们在一系列具有挑战性的运动任务中展示了设计决策的好处,并证明了单个任务的样本效率以及从一个任务到另一个任务的转移的提高。视频可在以下网址找到:https://sites.google.com/view/latent-skill-planning/
To quickly solve new tasks in complex environments, intelligent agents need to build up reusable knowledge. For example, a learned world model captures knowledge about the environment that applies to new tasks. Similarly, skills capture general behaviors that can apply to new tasks. In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent. Specifically, we leverage the idea of partial amortization for fast adaptation at test time. For this, actions are produced by a policy that is learned over time while the skills it conditions on are chosen using online planning. We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks and demonstrate improved sample efficiency in single tasks as well as in transfer from one task to another, as compared to competitive baselines. Videos are available at: https://sites.google.com/view/latent-skill-planning/