现实世界规划的开放式唱歌可查询场景表示

论文标题

现实世界规划的开放式唱歌可查询场景表示

Open-vocabulary Queryable Scene Representations for Real World Planning

论文作者

Chen, Boyuan, Xia, Fei, Ichter, Brian, Rao, Kanishka, Gopalakrishnan, Keerthana, Ryoo, Michael S., Stone, Austin, Kappler, Daniel

论文摘要

大型语言模型（LLM）从人类的指示中解开了任务计划的新功能。但是，事先尝试将LLM应用于现实世界的机器人任务受到周围场景中缺乏接地的限制。在本文中，我们开发了NLMAP，这是一个开放式唱片代表和可查询的场景表示，以解决此问题。 NLMAP是一个框架，将上下文信息收集和集成到LLM计划者中，允许他们在生成上下文条件条件计划之前在场景中查看和查询可用的对象。 NLMAP首先使用视觉语言模型（VLM）建立自然语言可查询场景表示。基于LLM的对象建议模块解析指令并提出涉及的对象，以查询场景表示形式以获取对象可用性和位置。然后，LLM计划者计划提供有关场景的信息。 NLMAP允许机器人在没有固定的对象列表或可执行选项的情况下操作，从而使真实的机器人操作无法通过以前的方法实现。项目网站：https：//nlmap-saycan.github.io

Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are limited by the lack of grounding in the surrounding scene. In this paper, we develop NLMap, an open-vocabulary and queryable scene representation to address this problem. NLMap serves as a framework to gather and integrate contextual information into LLM planners, allowing them to see and query available objects in the scene before generating a context-conditioned plan. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. NLMap allows robots to operate without a fixed list of objects nor executable options, enabling real robot operation unachievable by previous methods. Project website: https://nlmap-saycan.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题