论文标题
通过自动计划指导强化学习中的机器人探索
Guiding Robot Exploration in Reinforcement Learning via Automated Planning
论文作者
论文摘要
强化学习(RL)使代理商能够从试验经验中学习到实现长期目标;自动化计划旨在计算使用行动知识完成任务的计划。尽管他们完成了复杂的任务的共同目标,但由于其不同的计算方式,RL和自动化计划的制定在很大程度上被隔离了。为了提高RL代理的学习效率,我们开发了指导性的Dyna-Q(GDQ),以使RL代理能够用行动知识进行推理,以避免探索不太相关的状态。动作知识用于从乐观的模拟中产生人工体验。 GDQ已在模拟和使用移动机器人中进行评估,该机器人在多房间的环境中进行导航任务。与竞争性基线相比,GDQ大大减少了探索的努力,同时提高了学习政策的质量。
Reinforcement learning (RL) enables an agent to learn from trial-and-error experiences toward achieving long-term goals; automated planning aims to compute plans for accomplishing tasks using action knowledge. Despite their shared goal of completing complex tasks, the development of RL and automated planning has been largely isolated due to their different computational modalities. Focusing on improving RL agents' learning efficiency, we develop Guided Dyna-Q (GDQ) to enable RL agents to reason with action knowledge to avoid exploring less-relevant states. The action knowledge is used for generating artificial experiences from an optimistic simulation. GDQ has been evaluated in simulation and using a mobile robot conducting navigation tasks in a multi-room office environment. Compared with competitive baselines, GDQ significantly reduces the effort in exploration while improving the quality of learned policies.