多任务加强学习的分散政策梯度方法

论文标题

多任务加强学习的分散政策梯度方法

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

论文作者

Zeng, Sihan, Anwar, Aqeel, Doan, Thinh, Raychowdhury, Arijit, Romberg, Justin

论文摘要

我们开发了一个数学框架，用于根据一种策略梯度方法解决多任务加强学习（MTRL）问题。 MTRL的目标是学习一种在不同环境中有效运作的共同政策。这些环境具有相似的（或重叠）状态空间，但具有不同的奖励和动态。我们强调了MTRL中没有出现的两个基本挑战，这些挑战在其单一任务中不存在，并以简单的例子说明了它们。然后，我们开发了一种分散的熵调查策略梯度方法来解决MTRL问题，并研究其有限的时间收敛速率。我们使用一系列数值实验证明了所提出的方法的有效性。这些实验的范围从小规模的“网格世界”问题，这些问题很容易证明多任务学习涉及的权衡到大规模问题，在这些问题中，在多个（模拟）环境中学会了通用的政策来导航机载无人机。

We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are not present in its single task counterpart, and illustrate them with simple examples. We then develop a decentralized entropy-regularized policy gradient method for solving the MTRL problem, and study its finite-time convergence rate. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "GridWorld" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to navigate an airborne drone in multiple (simulated) environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题