基于后继功能的上下文，元加强学习

论文标题

基于后继功能的上下文，元加强学习

Meta Reinforcement Learning with Successor Feature Based Context

论文作者

Han, Xu, Wu, Feng

论文摘要

大多数强化学习（RL）方法仅专注于从头开始学习一项任务，并且无法使用先验知识来更有效地学习其他任务。最近提出了基于上下文的元元技术作为解决此问题的可能解决方案。但是，它们的效率通常不如常规RL，并且在培训期间可能需要许多试用器。为了解决这个问题，我们提出了一种新型的元RL方法，该方法与现有的元rl算法相比，可以实现竞争性能，而所需的环境相互作用则大大减少。通过将上下文变量与在后继功能框架中分解奖励的想法相结合，我们的方法不仅可以同时学习多个任务的高质量政策，而且可以迅速通过少量培训来适应新任务。与最先进的元rl基线相比，我们从经验上显示了我们方法对多个连续控制任务的有效性和数据效率。

Most reinforcement learning (RL) methods only focus on learning a single task from scratch and are not able to use prior knowledge to learn other tasks more effectively. Context-based meta RL techniques are recently proposed as a possible solution to tackle this. However, they are usually less efficient than conventional RL and may require many trial-and-errors during training. To address this, we propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms, while requires significantly fewer environmental interactions. By combining context variables with the idea of decomposing reward in successor feature framework, our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training. Compared with state-of-the-art meta-RL baselines, we empirically show the effectiveness and data efficiency of our method on several continuous control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题