通过从文本中转移学习的深度加固学习的人类指导跟踪

论文标题

通过从文本中转移学习的深度加固学习的人类指导跟踪

Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text

论文作者

Hill, Felix, Mokra, Sona, Wong, Nathaniel, Harley, Tim

论文摘要

最近的工作将基于神经网络的代理人描述了经过强化学习（RL）在模拟世界中执行类似语言的命令，这是迈向人类用户可以指导的智能代理或机器人的一步。但是，通过从头开始的深度RL优化多目标运动策略需要许多经验。因此，使用DEEP RL遵循的指令通常涉及模板（通过环境模拟器）生成的语言，这并不反映真实用户的多样或模棱两可的表达方式。在这里，我们提出了一种概念上简单的方法，用于训练具有深入RL的指导指导代理，对自然人类的指示很强。通过使用最先进的预训练的基于文本的语言模型（BERT）应用我们的方法，在要求代理在自然主义3D模拟房间中识别和定位日常对象的任务，我们证明了从合成模板命令中进行实质性的零击转移到人类给予的自然指示。我们的方法是培训任何基于RL的系统以与人类用户交互的一般食谱，并弥合了近期成功的两个研究方向之间的差距：以代理为中心的运动行为和基于文本的表示形式学习。

Recent work has described neural-network-based agents that are trained with reinforcement learning (RL) to execute language-like commands in simulated worlds, as a step towards an intelligent agent or robot that can be instructed by human users. However, the optimisation of multi-goal motor policies via deep RL from scratch requires many episodes of experience. Consequently, instruction-following with deep RL typically involves language generated from templates (by an environment simulator), which does not reflect the varied or ambiguous expressions of real users. Here, we propose a conceptually simple method for training instruction-following agents with deep RL that are robust to natural human instructions. By applying our method with a state-of-the-art pre-trained text-based language model (BERT), on tasks requiring agents to identify and position everyday objects relative to other objects in a naturalistic 3D simulated room, we demonstrate substantially-above-chance zero-shot transfer from synthetic template commands to natural instructions given by humans. Our approach is a general recipe for training any deep RL-based system to interface with human users, and bridges the gap between two research directions of notable recent success: agent-centric motor behavior and text-based representation learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题