通过对比度学习的离线元强化学习的强大任务表示

论文标题

通过对比度学习的离线元强化学习的强大任务表示

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

论文作者

Yuan, Haoqi, Lu, Zongqing

论文摘要

我们研究离线元加强学习，这是一种实用的强化学习范式，从离线数据中学习以适应新任务。离线数据的分布由行为政策和任务共同确定。现有的离线荟萃提升学习算法无法区分这些因素，从而使任务表示不稳定，而不稳定行为策略。为了解决这个问题，我们为任务表示形式提出了一个对比对比学习框架，这些框架对培训和测试中行为策略的分布不匹配是可靠的。我们设计了双层编码器结构，使用共同信息最大化来形式化任务表示学习，得出对比度学习目标，并引入了几种方法以近似负面对的真实分布。对各种离线元强化学习基准的实验证明了我们方法比先前方法的优势，尤其是在对分布外行为策略的概括方面。该代码可在https://github.com/pku-ai-ged/corro中找到。

We study offline meta-reinforcement learning, a practical reinforcement learning paradigm that learns from offline data to adapt to new tasks. The distribution of offline data is determined jointly by the behavior policy and the task. Existing offline meta-reinforcement learning algorithms cannot distinguish these factors, making task representations unstable to the change of behavior policies. To address this problem, we propose a contrastive learning framework for task representations that are robust to the distribution mismatch of behavior policies in training and test. We design a bi-level encoder structure, use mutual information maximization to formalize task representation learning, derive a contrastive learning objective, and introduce several approaches to approximate the true distribution of negative pairs. Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies. The code is available at https://github.com/PKU-AI-Edge/CORRO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题