使用分布式软演员评论家学习对象条件探索

论文标题

使用分布式软演员评论家学习对象条件探索

Learning Object-conditioned Exploration using Distributed Soft Actor Critic

论文作者

Wahid, Ayzaan, Stone, Austin, Chen, Kevin, Ichter, Brian, Toshev, Alexander

论文摘要

对象导航定义为在复杂的，未探索的环境中导航到给定标签的对象。以其一般形式，此问题对机器人技术构成了几个挑战：寻找对象和低水平控制的未知环境的语义探索。在这项工作中，我们研究了对象引导的探索和低级控制，并提出了端到端训练有素的导航政策，其成功率为0.68，SPL为0.58，在看不见的，视觉上复杂的真实房屋扫描中。我们提出了一个高度可扩展的实施，即分布式软演员评论家的政策钢筋学习算法，该算法允许该系统在8 GPU的24小时内在24小时内使用98m的体验步骤。我们的系统学会从机器人平台上常用的一堆高维观测值中控制模拟中的差分驱动器基础。学识渊博的政策能够以对象引导的探索性行为和从现实环境中的纯经验中学到的低级控制。

Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题