尽我所能，而不是我所说：机器人负担中的基础语言

论文标题

尽我所能，而不是我所说：机器人负担中的基础语言

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

论文作者

Ahn, Michael, Brohan, Anthony, Brown, Noah, Chebotar, Yevgen, Cortes, Omar, David, Byron, Finn, Chelsea, Fu, Chuyuan, Gopalakrishnan, Keerthana, Hausman, Karol, Herzog, Alex, Ho, Daniel, Hsu, Jasmine, Ibarz, Julian, Ichter, Brian, Irpan, Alex, Jang, Eric, Ruano, Rosario Jauregui, Jeffrey, Kyle, Jesmonth, Sally, Joshi, Nikhil J, Julian, Ryan, Kalashnikov, Dmitry, Kuang, Yuheng, Lee, Kuang-Huei, Levine, Sergey, Lu, Yao, Luu, Linda, Parada, Carolina, Pastor, Peter, Quiambao, Jornell, Rao, Kanishka, Rettinghouse, Jarek, Reyes, Diego, Sermanet, Pierre, Sievers, Nicolas, Tan, Clayton, Toshev, Alexander, Vanhoucke, Vincent, Xia, Fei, Xiao, Ted, Xu, Peng, Xu, Sichun, Yan, Mengyuan, Zeng, Andy

论文摘要

大型语言模型可以编码有关世界的大量语义知识。这种知识对于旨在采取自然语言表达的高级，时间扩展的指令的机器人可能非常有用。但是，语言模型的一个重大弱点是它们缺乏现实世界的经验，这使得很难利用它们在给定的实施方案中进行决策。例如，要求语言模型描述如何清洁溢出物可能会导致合理的叙述，但是它可能不适用于需要在特定环境中执行此任务的特定代理商（例如机器人）。我们建议通过预处理的技能来提供现实世界的基础，这些技能用于限制模型以提出既可行又适当的自然语言动作。该机器人可以充当语言模型的“手和眼睛”，而语言模型可以提供有关任务的高级语义知识。我们展示了如何将低级技能与大语言模型结合在一起，以便语言模型提供有关执行复杂和时间扩展指令的过程的高级知识，而与这些技能相关的价值功能则提供了将这些知识连接到特定物理环境的必要基础。我们在许多现实世界的机器人任务上评估了我们的方法，我们表明了对现实接地的需求，并且这种方法能够在移动操纵器上完成长远，抽象的自然语言指令。该项目的网站和视频可以在https://say-can.github.io/上找到。

Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题