在多进球强化学习机器人方面的扎根于事后指示

论文标题

在多进球强化学习机器人方面的扎根于事后指示

Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics

论文作者

Röder, Frank, Eppe, Manfred, Wermter, Stefan

论文摘要

本文着重于机器人增强学习，并以稀疏的自然语言目标表示。一个开放的问题是源于自然语言的组成性，以及在感觉数据和动作中的语言基础。我们通过三个贡献解决了这些问题。我们首先提出了一种利用专家反馈的事后视觉指导重播的机制。其次，我们提出了一个SEQ2SEQ模型，以生成语言的后视指令。最后，我们提出了一类新颖的以语言为中心的学习任务。我们表明，事后观察说明可以提高预期的学习绩效。此外，我们还提供了一个意外的结果：我们表明，如果从某种意义上说，代理人学习以一种自我监督的方式与自己交谈，则可以提高代理商的学习表现。我们通过学习产生语言指示来实现这一目标，这本来可以作为最初意外行为的自然语言目标。我们的结果表明，绩效增益随任务复杂性而增加。

This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. An open problem is the sample-inefficiency that stems from the compositionality of natural language, and from the grounding of language in sensory data and actions. We address these issues with three contributions. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions. Finally, we present a novel class of language-focused learning tasks. We show that hindsight instructions improve the learning performance, as expected. In addition, we also provide an unexpected result: We show that the learning performance of our agent can be improved by one third if, in a sense, the agent learns to talk to itself in a self-supervised manner. We achieve this by learning to generate linguistic instructions that would have been appropriate as a natural language goal for an originally unintended behavior. Our results indicate that the performance gain increases with the task-complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题