强化学习作为迭代和摊销推理

论文标题

强化学习作为迭代和摊销推理

Reinforcement Learning as Iterative and Amortised Inference

论文作者

Millidge, Beren, Tschantz, Alexander, Seth, Anil K, Buckley, Christopher L

论文摘要

有几种方法可以分类增强学习（RL）算法，例如基于模型或基于策略的模型或基于策略的或基于计划的模型，政策或非政策，以及在线或离线或离线。诸如此类的广泛分类方案有助于对不同的技术提供统一的观点，并可以将新算法的发展和指导性化。在本文中，我们利用控制作为推理框架来概述基于摊销和迭代推断的新型分类方案。我们证明，可以以这种方式对各种算法进行分类，从而提供新的视角，并突出一系列现有的相似之处。此外，我们表明，从这种角度来看，我们可以识别出相对尚未探索的算法设计空间的一部分，这暗示了创新的RL算法的新路线。

There are several ways to categorise reinforcement learning (RL) algorithms, such as either model-based or model-free, policy-based or planning-based, on-policy or off-policy, and online or offline. Broad classification schemes such as these help provide a unified perspective on disparate techniques and can contextualise and guide the development of new algorithms. In this paper, we utilise the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We demonstrate that a wide range of algorithms can be classified in this manner providing a fresh perspective and highlighting a range of existing similarities. Moreover, we show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored, suggesting new routes to innovative RL algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题