通过基于模型的正则化将RL转移到观察特征空间上

论文标题

通过基于模型的正则化将RL转移到观察特征空间上

Transfer RL across Observation Feature Spaces via Model-Based Regularization

论文作者

Sun, Yanchao, Zheng, Ruijie, Wang, Xiyao, Cohen, Andrew, Huang, Furong

论文摘要

在许多强化学习（RL）应用中，观察空间由人类开发人员指定并受物理实现的限制，因此可能会随着时间的推移而发生巨大变化（例如，可观察到的特征数量增加）。但是，当观察空间变化时，由于输入功能的不匹配，先前的策略可能会失败，并且必须从头开始训练另一个策略，这在计算和样本复杂性方面效率低下。遵循理论见解，我们提出了一种新型算法，该算法在源任务中提取潜在空间动力学，并将动力学模型转移到目标任务中，以用作基于模型的正常化程序。我们的算法可用于观察空间的急剧变化（例如，从基于向量的观察到基于图像的观测值），而没有任何任务间映射或目标任务的任何先验知识。经验结果表明，我们的算法显着提高了目标任务中学习的效率和稳定性。

In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e.g. increased number of observable features). However, when the observation space changes, the previous policy will likely fail due to the mismatch of input features, and another policy must be trained from scratch, which is inefficient in terms of computation and sample complexity. Following theoretical insights, we propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task to use as a model-based regularizer. Our algorithm works for drastic changes of observation space (e.g. from vector-based observation to image-based observation), without any inter-task mapping or any prior knowledge of the target task. Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题