论文标题

基于元模型的元模型优化

Meta-Model-Based Meta-Policy Optimization

论文作者

Hiraoka, Takuya, Imagawa, Takahisa, Tangkaratt, Voot, Osa, Takayuki, Onishi, Takashi, Tsuruoka, Yoshimasa

论文摘要

基于模型的元提升学习(RL)方法最近被证明是提高多任务设置中RL样品效率的一种有前途的方法。但是,对这些方法的理论理解尚未建立,目前在现实环境中尚无理论上保证其表现。在本文中,我们通过扩展Janner等人提出的定理来分析基于模型的META-RL方法的性能保证。 (2019)。根据我们的理论结果,我们建议基于元模型的元模型优化(M3PO),这是一种具有性能保证的基于模型的元RL方法。我们证明,M3PO在连续控制基准中优于现有的元RL方法。

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源