基于模型的强化学习的动态范围价值估计

论文标题

基于模型的强化学习的动态范围价值估计

Dynamic Horizon Value Estimation for Model-based Reinforcement Learning

论文作者

Wang, Junjie, Zhang, Qichao, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye

论文摘要

现有的基于模型的价值扩展方法通常利用世界模型通过固定的推出范围来进行价值估计来协助政策学习。但是，使用不准确模型的固定推出具有损害学习过程的潜力。在本文中，我们研究了使用模型知识以自适应增值的想法。我们提出了一种新的方法，称为“基于动态摩”模型的价值扩展（DMVE），以通过不同的推出范围调整世界模型使用情况。受基于重建的技术的启发，可以应用于可视数据新颖性检测，我们利用具有重建模块的世界模型进行图像特征提取，以获取更精确的价值估计。原始图像和重建的图像均用于确定适当的自适应值扩展的地平线。在几个基准视觉控制任务上，实验结果表明，DMVE在样本效率和最终性能方面的表现优于所有基准，这表明DMVE可以实现比基于最新模型的方法更有效，更准确的价值估计。

Existing model-based value expansion methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, the fixed rollout with an inaccurate model has a potential to harm the learning process. In this paper, we investigate the idea of using the model knowledge for value expansion adaptively. We propose a novel method called Dynamic-horizon Model-based Value Expansion (DMVE) to adjust the world model usage with different rollout horizons. Inspired by reconstruction-based techniques that can be applied for visual data novelty detection, we utilize a world model with a reconstruction module for image feature extraction, in order to acquire more precise value estimation. The raw and the reconstructed images are both used to determine the appropriate horizon for adaptive value expansion. On several benchmark visual control tasks, experimental results show that DMVE outperforms all baselines in sample efficiency and final performance, indicating that DMVE can achieve more effective and accurate value estimation than state-of-the-art model-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题