学习动态信念图，以推广基于文本的游戏

论文标题

学习动态信念图，以推广基于文本的游戏

Learning Dynamic Belief Graphs to Generalize on Text-Based Games

论文作者

Adhikari, Ashutosh, Yuan, Xingdi, Côté, Marc-Alexandre, Zelinka, Mikuláš, Rondeau, Marc-Antoine, Laroche, Romain, Poupart, Pascal, Tang, Jian, Trischler, Adam, Hamilton, William L.

论文摘要

玩基于文本的游戏需要处理自然语言和顺序决策的技能。在基于文本的游戏中实现人类水平的性能仍然是一个悬而未决的挑战，并且先前的研究主要依赖于手工制作的结构化表示和启发式方法。在这项工作中，我们研究了代理如何使用从原始文本端到端学习的图形结构表示形式计划和概括在基于文本的游戏中。我们提出了一种新颖的图形变压器代理（GATA），该代理（GATA）在计划过程中渗透并更新潜在的信念图，以通过捕获基础游戏动力来实现有效的动作选择。 GATA是使用加强和自我监督学习的结合进行训练的。我们的工作表明，基于图形的表示形式可帮助代理人收敛于仅仅是文本的策略，并促进跨游戏配置的有效概括。 Textworld Suite的500多种独特游戏的实验表明，我们的最佳代理商的表现平均优于基于文本的基线24.2％。

Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text. We propose a novel graph-aided transformer agent (GATA) that infers and updates latent belief graphs during planning to enable effective action selection by capturing the underlying game dynamics. GATA is trained using a combination of reinforcement and self-supervised learning. Our work demonstrates that the learned graph-based representations help agents converge to better policies than their text-only counterparts and facilitate effective generalization across game configurations. Experiments on 500+ unique games from the TextWorld suite show that our best agent outperforms text-based baselines by an average of 24.2%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题