论文标题

叶:前沿的潜在探索

LEAF: Latent Exploration Along the Frontier

论文作者

Bharadhwaj, Homanga, Garg, Animesh, Shkurti, Florian

论文摘要

自我监督的目标建议和到达是探索和有效政策学习算法的关键组成部分。这种自我监督的方法无需访问任何Oracle目标采样分布就需要深入探索和承诺,以便可以有效地发现长时间的计划。在本文中,我们提出了一个探索框架,该探索框架学习了可达状态的动态意识。为了实现目标,我们提出的方法确定性地访问了当前可及国家边界的状态(承诺/触及),然后随机探索以达到目标(探索)。这可以在可及地区域的边界附近分配勘探预算,而不是其内部。我们针对从指定为图像的初始目标和目标状态进行政策学习的具有挑战性的问题,并且不假设对机器人和环境的基本地面状态进行任何访问。为了跟踪可触及的潜在状态,我们提出了一个距离条件的可达性网络,该网络经过训练,可以推断一个状态是否可以从指定的潜在空间距离内的另一个状态达到。鉴于初始状态,我们从该状态获得了可及状态的边界。通过在更艰难的目标之前纳入一门课程,以更轻松的目标(更接近开始状态),我们证明了拟议的自我监督探索算法,与一系列具有挑战性的机器人环境中的现有基线相比,表现出色。

Self-supervised goal proposal and reaching is a key component for exploration and efficient policy learning algorithms. Such a self-supervised approach without access to any oracle goal sampling distribution requires deep exploration and commitment so that long horizon plans can be efficiently discovered. In this paper, we propose an exploration framework, which learns a dynamics-aware manifold of reachable states. For a goal, our proposed method deterministically visits a state at the current frontier of reachable states (commitment/reaching) and then stochastically explores to reach the goal (exploration). This allocates exploration budget near the frontier of the reachable region instead of its interior. We target the challenging problem of policy learning from initial and goal states specified as images, and do not assume any access to the underlying ground-truth states of the robot and the environment. To keep track of reachable latent states, we propose a distance-conditioned reachability network that is trained to infer whether one state is reachable from another within the specified latent space distance. Given an initial state, we obtain a frontier of reachable states from that state. By incorporating a curriculum for sampling easier goals (closer to the start state) before more difficult goals, we demonstrate that the proposed self-supervised exploration algorithm, superior performance compared to existing baselines on a set of challenging robotic environments.https://sites.google.com/view/leaf-exploration

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源