通过语言扎根的零击组成政策学习

论文标题

通过语言扎根的零击组成政策学习

Zero-Shot Compositional Policy Learning via Language Grounding

论文作者

Cao, Tianshi, Wang, Jingkang, Zhang, Yining, Manivasagam, Sivabalan

论文摘要

尽管最近在增强学习（RL）和模仿学习（IL）方面取得了突破，但现有的算法未能推广到培训环境之外。实际上，人类可以通过利用有关世界描述等世界的先验知识来快速适应新任务。为了促进对具有域名的语言引导代理的研究，我们提出了一项新颖的零击组成策略学习任务，其中环境的特征是不同属性的组成。由于没有支持这项研究的公共环境，因此我们介绍了一个新的研究平台Babyai ++，其中环境动态与视觉外观相关。在每个情节中，Babyai ++都提供各种视觉动力学组合以及相应的描述性文本。为了评估学到的代理的适应能力，持有一组视觉动力学配对，用于在Babyai ++上进行测试。毫不奇怪，我们发现当前的语言引导的RL/IL技术过于培训环境，并且在面对看不见的组合时会遭受巨大的性能下降。作为回应，我们提出了一种具有注意机制的多模式融合方法，以执行视觉语言接地。广泛的实验表明，语言接地能够改善各种动态环境的代理的概括。

Despite recent breakthroughs in reinforcement learning (RL) and imitation learning (IL), existing algorithms fail to generalize beyond the training environments. In reality, humans can adapt to new tasks quickly by leveraging prior knowledge about the world such as language descriptions. To facilitate the research on language-guided agents with domain adaption, we propose a novel zero-shot compositional policy learning task, where the environments are characterized as a composition of different attributes. Since there are no public environments supporting this study, we introduce a new research platform BabyAI++ in which the dynamics of environments are disentangled from visual appearance. At each episode, BabyAI++ provides varied vision-dynamics combinations along with corresponding descriptive texts. To evaluate the adaption capability of learned agents, a set of vision-dynamics pairings are held-out for testing on BabyAI++. Unsurprisingly, we find that current language-guided RL/IL techniques overfit to the training environments and suffer from a huge performance drop when facing unseen combinations. In response, we propose a multi-modal fusion method with an attention mechanism to perform visual language-grounding. Extensive experiments show strong evidence that language grounding is able to improve the generalization of agents across environments with varied dynamics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题