在层次增强学习中生成邻接受限的子目标

论文标题

在层次增强学习中生成邻接受限的子目标

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

论文作者

Zhang, Tianren, Guo, Shangqi, Tan, Tian, Hu, Xiaolin, Chen, Feng

论文摘要

目标条件层次结构增强学习（HRL）是扩展增强学习（RL）技术的有前途的方法。但是，由于高级的动作空间，即目标空间通常很大。在大型目标空间中进行搜索为高级亚目标和低级政策学习带来了困难。在本文中，我们表明，可以使用邻接约束来限制从整个目标空间到当前状态的$ k $步骤相邻区域的高级动作空间，从而有效缓解此问题。从理论上讲，我们证明所提出的邻接约束在确定性MDP中保留了最佳的层次结构策略，并证明可以通过训练可以区分邻近和非标准的副产品来实际实现此约束。离散和连续控制任务的实验结果表明，合并邻接的约束可改善确定性和随机环境中最先进的HRL方法的性能。

Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. Searching in a large goal space poses difficulties for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that the proposed adjacency constraint preserves the optimal hierarchical policy in deterministic MDPs, and show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks show that incorporating the adjacency constraint improves the performance of state-of-the-art HRL approaches in both deterministic and stochastic environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题