论文标题

亚目标树 - 基于目标的强化学习的框架

Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning

论文作者

Jurgenson, Tom, Avner, Or, Groshev, Edward, Tamar, Aviv

论文摘要

机器人技术和其他领域中的许多AI问题都是基于目标的,本质上是在寻求导致各种目标状态的轨迹。在Bellman的最佳方程式建立基础的强化学习(RL)自然可以针对一个目标优化,但可以通过以目标来增强国家来实现多目标。取而代之的是,我们提出了一个新的RL框架,该框架是从所有对最短路径(APSP)问题的动态编程方程中得出的,该方程自然可以解决多目标查询。我们表明,这种方法对标准和近似动态编程具有计算益处。有趣的是,我们的公式规定了计算轨迹的新颖协议:而不是预测其前身的下一个状态,就像在标准RL中一样,通过首先预测启动和目标之间的中间状态,将轨迹划分为二线,而不是通过标准RL进行构建。然后,递归地预测每个子段上的中间点,直到获得完整的轨迹为止。我们将此轨迹结构称为亚目标树。在此基础上,我们还将政策梯度方法扩展到递归预测子目标,从而产生新的基于目标的算法。最后,我们将我们的方法应用于神经运动计划,与在障碍物之间导航7-DOF机器人臂相比,我们表现出显着改善。

Many AI problems, in robotics and other domains, are goal-based, essentially seeking trajectories leading to various goal states. Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal. Instead, we propose a new RL framework, derived from a dynamic programming equation for the all pairs shortest path (APSP) problem, which naturally solves multi-goal queries. We show that this approach has computational benefits for both standard and approximate dynamic programming. Interestingly, our formulation prescribes a novel protocol for computing a trajectory: instead of predicting the next state given its predecessor, as in standard RL, a goal-conditioned trajectory is constructed by first predicting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this trajectory structure a sub-goal tree. Building on it, we additionally extend the policy gradient methodology to recursively predict sub-goals, resulting in novel goal-based algorithms. Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-DoF robot arm between obstacles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源