论文标题

通过有限培训任务进行元加强学习 - 密度估计方法

Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach

论文作者

Rimon, Zohar, Tamar, Aviv, Adler, Gilad

论文摘要

在元加强学习(META RL)中,代理商从一组培训任务中学习了如何快速解决从相同任务分布中绘制的新任务。最佳的元rl政策,又称贝叶斯最佳行为,是很好的定义,并保证了对任务分布的预期最佳奖励。我们在这项工作中探讨的问题是,需要多少培训任务来保证具有很高可能性的大致最佳行为。最近的工作为无模型设置提供了第一个这样的PAC分析,其中从培训任务中学到了依赖历史的政策。在这项工作中,我们提出了一种不同的方法:使用密度估计技术直接学习任务分布,然后对学习任务分布进行培训。我们表明,我们的方法导致界限取决于任务分布的维度。特别是,在任务分布中处于低维多方面的设置中,我们将分析扩展到使用降低性降低技术并说明这种结构,从而比以前的工作明显更好,这严格取决于状态和行动的数量。我们方法的关键是内核密度估计方法所隐含的正则化。我们进一步证明,当“插入”最先进的Varibad Meta RL算法时,这种正则化在实践中很有用。

In meta reinforcement learning (meta RL), an agent learns from a set of training tasks how to quickly solve a new task, drawn from the same task distribution. The optimal meta RL policy, a.k.a. the Bayes-optimal behavior, is well defined, and guarantees optimal reward in expectation, taken with respect to the task distribution. The question we explore in this work is how many training tasks are required to guarantee approximately optimal behavior with high probability. Recent work provided the first such PAC analysis for a model-free setting, where a history-dependent policy was learned from the training tasks. In this work, we propose a different approach: directly learn the task distribution, using density estimation techniques, and then train a policy on the learned task distribution. We show that our approach leads to bounds that depend on the dimension of the task distribution. In particular, in settings where the task distribution lies in a low-dimensional manifold, we extend our analysis to use dimensionality reduction techniques and account for such structure, obtaining significantly better bounds than previous work, which strictly depend on the number of states and actions. The key of our approach is the regularization implied by the kernel density estimation method. We further demonstrate that this regularization is useful in practice, when `plugged in' the state-of-the-art VariBAD meta RL algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源