带有上下文相关的观察预处理以概括为基于文本的游戏的引导Q学习

论文标题

带有上下文相关的观察预处理以概括为基于文本的游戏的引导Q学习

Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games

论文作者

Chaudhury, Subhajit, Kimura, Daiki, Talamadupula, Kartik, Tatsubori, Michiaki, Munawar, Asim, Tachibana, Ryuki

论文摘要

我们表明，解决基于文本的游戏（TBG）的增强学习方法（RL）方法通常无法推广到看不见的游戏，尤其是在小型数据制度中。为了解决这个问题，我们建议在观察文本中无关紧要的删除情节截断（CREST），以改善概括。我们的方法首先使用Q-学习训练基本模型，这通常使训练游戏过高。基本模型的作用令牌分布用于执行去除无关令牌的观察修剪。然后在修剪的观察文本上重新训练第二个自举模型。我们的自举代理在解决看不见的文本世界游戏方面显示出改进的概括，尽管需要减少培训剧集的数量，但使用较少的培训游戏比以前的最新方法减少了10倍。

We show that Reinforcement Learning (RL) methods for solving Text-Based Games (TBGs) often fail to generalize on unseen games, especially in small data regimes. To address this issue, we propose Context Relevant Episodic State Truncation (CREST) for irrelevant token removal in observation text for improved generalization. Our method first trains a base model using Q-learning, which typically overfits the training games. The base model's action token distribution is used to perform observation pruning that removes irrelevant tokens. A second bootstrapped model is then retrained on the pruned observation text. Our bootstrapped agent shows improved generalization in solving unseen TextWorld games, using 10x-20x fewer training games compared to previous state-of-the-art methods despite requiring less number of training episodes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题