基于技能的基于模型的强化学习

论文标题

基于技能的基于模型的强化学习

Skill-based Model-based Reinforcement Learning

论文作者

Shi, Lucy Xiaoyang, Lim, Joseph J., Lee, Youngwoon

论文摘要

基于模型的增强学习（RL）是一种通过利用学习的单步动力学模型来计划想象行为的样本有效学习复杂行为的方法。但是，计划为长途任务计划的每项行动都是不切实际的，类似于每个肌肉运动的人类计划。相反，人类有效地计划具有高级技能来解决复杂的任务。通过这种直觉，我们提出了一个基于技能的RL框架（SKIMO），该框架可以使用技能动力学模型在技能空间中进行计划，该模型可以直接预测技能成果，而不是逐步预测中级状态中的所有小细节。为了准确有效的长期计划，我们共同学习了从先前的经验中学习技能动力学模型和技能曲目。然后，我们利用学习的技能动力学模型来准确模拟和计划技能空间中的长范围，这可以有效地在下游学习长马，稀疏的奖励任务。导航和操纵域中的实验结果表明，Skimo扩展了基于模型的方法的时间范围，并提高了基于模型的RL和基于技能的RL的样品效率。代码和视频可在https://clvrai.com/skimo上找到

Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors by leveraging a learned single-step dynamics model to plan actions in imagination. However, planning every action for long-horizon tasks is not practical, akin to a human planning out every muscle movement. Instead, humans efficiently plan with high-level skills to solve complex tasks. From this intuition, we propose a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space using a skill dynamics model, which directly predicts the skill outcomes, rather than predicting all small details in the intermediate states, step by step. For accurate and efficient long-term planning, we jointly learn the skill dynamics model and a skill repertoire from prior experience. We then harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space, which enables efficient downstream learning of long-horizon, sparse reward tasks. Experimental results in navigation and manipulation domains show that SkiMo extends the temporal horizon of model-based approaches and improves the sample efficiency for both model-based RL and skill-based RL. Code and videos are available at https://clvrai.com/skimo

下载PDF全文

下载文献需遵守相关版权规定

论文标题