论文标题
在终生的非平稳性中深入的增强学习
Deep Reinforcement Learning amidst Lifelong Non-Stationarity
论文作者
论文摘要
作为人类,根据我们的经验,行动以及内部和外部驱动器,我们的目标和环境在一生中持续变化。相反,典型的加强学习问题设置考虑了跨情节固定的决策过程。我们可以开发加强学习算法,以应对前者更现实的问题设置的持续变化吗?虽然可以将诸如政策梯度等政策算法扩展到非固定设置,但对于更有效的效率外算法,在学习时重新播放了过去的经验的情况下,不能这样说。在这项工作中,我们将此问题设置形式化,并借鉴在线学习和概率推论文献中的想法,以得出一种可以推论并解决此类终身非平稳性的非政策RL算法。我们的方法利用潜在变量模型从当前和过去的经验中学习了环境的表示,并通过此表示执行非政策RL。我们进一步介绍了几种表现出终身非平稳性的模拟环境,并从经验上发现,我们的方法基本上优于不建议环境转移的方法。
As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more realistic problem settings? While on-policy algorithms such as policy gradients in principle can be extended to non-stationary settings, the same cannot be said for more efficient off-policy algorithms that replay past experiences when learning. In this work, we formalize this problem setting, and draw upon ideas from the online learning and probabilistic inference literature to derive an off-policy RL algorithm that can reason about and tackle such lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation. We further introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.