亚目标树 - 基于目标的强化学习的框架

论文标题

亚目标树 - 基于目标的强化学习的框架

Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning

论文作者

Jurgenson, Tom, Avner, Or, Groshev, Edward, Tamar, Aviv

论文摘要

机器人技术和其他领域中的许多AI问题都是基于目标的，本质上是在寻求导致各种目标状态的轨迹。在Bellman的最佳方程式建立基础的强化学习（RL）自然可以针对一个目标优化，但可以通过以目标来增强国家来实现多目标。取而代之的是，我们提出了一个新的RL框架，该框架是从所有对最短路径（APSP）问题的动态编程方程中得出的，该方程自然可以解决多目标查询。我们表明，这种方法对标准和近似动态编程具有计算益处。有趣的是，我们的公式规定了计算轨迹的新颖协议：而不是预测其前身的下一个状态，就像在标准RL中一样，通过首先预测启动和目标之间的中间状态，将轨迹划分为二线，而不是通过标准RL进行构建。然后，递归地预测每个子段上的中间点，直到获得完整的轨迹为止。我们将此轨迹结构称为亚目标树。在此基础上，我们还将政策梯度方法扩展到递归预测子目标，从而产生新的基于目标的算法。最后，我们将我们的方法应用于神经运动计划，与在障碍物之间导航7-DOF机器人臂相比，我们表现出显着改善。

Many AI problems, in robotics and other domains, are goal-based, essentially seeking trajectories leading to various goal states. Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal. Instead, we propose a new RL framework, derived from a dynamic programming equation for the all pairs shortest path (APSP) problem, which naturally solves multi-goal queries. We show that this approach has computational benefits for both standard and approximate dynamic programming. Interestingly, our formulation prescribes a novel protocol for computing a trajectory: instead of predicting the next state given its predecessor, as in standard RL, a goal-conditioned trajectory is constructed by first predicting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this trajectory structure a sub-goal tree. Building on it, we additionally extend the policy gradient methodology to recursively predict sub-goals, resulting in novel goal-based algorithms. Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-DoF robot arm between obstacles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题