查看基于价值的决策时间与背景计划跨不同设置的方法

论文标题

查看基于价值的决策时间与背景计划跨不同设置的方法

A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

论文作者

Alver, Safa, Precup, Doina

论文摘要

在基于模型的强化学习（RL）中，代理可以利用学习的模型来改善其行为方式。这样做的两种普遍方法是通过决策时间和背景计划方法。在这项研究中，我们有兴趣了解这两种计划方法的基于价值的版本如何在不同的环境中相互比较。为了实现这一目标，我们首先考虑了基于价值的决策时间和背景计划方法的最简单实例，并提供了理论上的结果，在常规的RL和转移学习设置中，它将更好地表现。然后，我们考虑了它们的现代实例，并提供了在同一设置中表现更好的假设。最后，我们执行说明性实验来验证这些理论结果和假设。总体而言，我们的发现表明，即使两种计划方法的基于价值的版本在其最简单的实例化中执行了PAR，但基于价值的决策时间计划方法的现代实例可以在PAR上执行或比常规RL和转移学习环境中基于价值的背景计划方法的现代实例化。

In model-based reinforcement learning (RL), an agent can leverage a learned model to improve its way of behaving in different ways. Two of the prevalent ways to do this are through decision-time and background planning methods. In this study, we are interested in understanding how the value-based versions of these two planning methods will compare against each other across different settings. Towards this goal, we first consider the simplest instantiations of value-based decision-time and background planning methods and provide theoretical results on which one will perform better in the regular RL and transfer learning settings. Then, we consider the modern instantiations of them and provide hypotheses on which one will perform better in the same settings. Finally, we perform illustrative experiments to validate these theoretical results and hypotheses. Overall, our findings suggest that even though value-based versions of the two planning methods perform on par in their simplest instantiations, the modern instantiations of value-based decision-time planning methods can perform on par or better than the modern instantiations of value-based background planning methods in both the regular RL and transfer learning settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题