论文标题
具有自然语言目标的逆强化学习
Inverse Reinforcement Learning with Natural Language Goals
论文作者
论文摘要
人类通常使用自然语言彼此交流任务要求。理想情况下,自然语言也应适用于将目标传达给自主机器(例如机器人)以最大程度地减少任务规范中的摩擦。但是,将自然语言目标理解和映射到国家和行动的序列是具有挑战性的。具体而言,沿着这些线路的现有工作在将学习的政策推广到新的自然语言目标和环境方面遇到了困难。在本文中,我们提出了一种新颖的对抗性逆增强学习算法来学习具有语言条件的政策和奖励功能。为了提高学识渊博的政策和奖励功能的概括,我们使用各种目标生成器在训练过程中重新标记轨迹并采样不同的目标。我们的算法在数据集(Room-2-room)之后的基于视觉的自然语言指令上的大幅度优于多个基线,这证明了在指定代理目标中使用自然语言指令的前进。
Humans generally use natural language to communicate task requirements to each other. Ideally, natural language should also be usable for communicating goals to autonomous machines (e.g., robots) to minimize friction in task specification. However, understanding and mapping natural language goals to sequences of states and actions is challenging. Specifically, existing work along these lines has encountered difficulty in generalizing learned policies to new natural language goals and environments. In this paper, we propose a novel adversarial inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function. To improve generalization of the learned policy and reward function, we use a variational goal generator to relabel trajectories and sample diverse goals during training. Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset (Room-2-Room), demonstrating a promising advance in enabling the use of natural language instructions in specifying agent goals.