与高斯流程的无限混合物，任务不可能的在线增强学习

论文标题

与高斯流程的无限混合物，任务不可能的在线增强学习

Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes

论文作者

Xu, Mengdi, Ding, Wenhao, Zhu, Jiacheng, Liu, Zuxin, Chen, Baiming, Zhao, Ding

论文摘要

在元学习和持续学习中，不断地学习以有限的经验来解决看不见的任务，但是具有限制的假设，例如可访问的任务分布，独立和相同分配的任务以及清晰的任务描述。但是，现实世界中的物理任务经常违反这些假设，从而导致性能降解。本文提出了一种基于在线模型的强化学习方法，该方法不需要预培训来解决未知任务边界的任务敏捷问题。我们保持专家的混合物来处理非机构性，并用高斯过程表示每种不同类型的动力学，以有效利用收集的数据并表达不确定性。我们在说明流数据中的时间依赖性之前提出了一个过渡，并通过顺序变异推断在线更新混合物。我们的方法可靠地通过生成新的模型，以用于从未见过的动态并重复使用以前看到的动态的旧模型来可靠地处理任务分布的转移。在实验中，我们的方法在非平稳任务中的替代方法超过了替代方法，包括在不同驾驶场景中使用变化的动态和决策制定的经典控制。

Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. This paper proposes a continual online model-based reinforcement learning approach that does not require pre-training to solve task-agnostic problems with unknown task boundaries. We maintain a mixture of experts to handle nonstationarity, and represent each different type of dynamics with a Gaussian Process to efficiently leverage collected data and expressively model uncertainty. We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference. Our approach reliably handles the task distribution shift by generating new models for never-before-seen dynamics and reusing old models for previously seen dynamics. In experiments, our approach outperforms alternative methods in non-stationary tasks, including classic control with changing dynamics and decision making in different driving scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题