基于元模型的元模型优化

论文标题

基于元模型的元模型优化

Meta-Model-Based Meta-Policy Optimization

论文作者

Hiraoka, Takuya, Imagawa, Takahisa, Tangkaratt, Voot, Osa, Takayuki, Onishi, Takashi, Tsuruoka, Yoshimasa

论文摘要

基于模型的元提升学习（RL）方法最近被证明是提高多任务设置中RL样品效率的一种有前途的方法。但是，对这些方法的理论理解尚未建立，目前在现实环境中尚无理论上保证其表现。在本文中，我们通过扩展Janner等人提出的定理来分析基于模型的META-RL方法的性能保证。（2019）。根据我们的理论结果，我们建议基于元模型的元模型优化（M3PO），这是一种具有性能保证的基于模型的元RL方法。我们证明，M3PO在连续控制基准中优于现有的元RL方法。

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题