Deepaveragers：通过解决派生的非参数MDP来解决离线增强学习

论文标题

Deepaveragers：通过解决派生的非参数MDP来解决离线增强学习

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

论文作者

Shrestha, Aayam, Lee, Stefan, Tadepalli, Prasad, Fern, Alan

论文摘要

我们研究了一种基于从经验的静态数据集中衍生的有限代表性的MDP，研究脱机增强学习方法（RL）。该方法可以应用于任何学习的表示形式之上，并有可能轻松支持多个解决方案目标以及对不断变化的环境和目标的零射击调整。我们的主要贡献是用成本MDP（DAC-MDP）引入深度平均，并调查其离线RL的解决方案。 DAC-MDP是一种非参数模型，可以通过引入模型中代表性不足的部分来利用深层表示并计算有限的数据。从理论上讲，我们显示了允许降低DAC-MDP解决方案的性能的条件。我们还研究了许多环境中的经验行为，包括具有基于图像的观测值的环境。总体而言，实验表明该框架可以在实践中起作用，并扩展到大型复杂的离线RL问题。

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjustment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with image-based observations. Overall, the experiments demonstrate that the framework can work in practice and scale to large complex offline RL problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题