论文标题
稳定基于变压器的动作序列生成Q学习
Stabilizing Transformer-Based Action Sequence Generation For Q-Learning
论文作者
论文摘要
自从最初的变压器体系结构发布(Vaswani等人,2017年)以来,变形金刚彻底改变了自然语言处理领域。这主要是由于他们比基于RNN的架构更好地理解及时依赖性的能力。令人惊讶的是,即使RNN在RL中很受欢迎,并且时间依赖性在RL中非常普遍,但这种体系结构的变化也不会影响增强学习领域(RL)。最近,Parisotto等。 2019年)对RL中的变压器进行了首次有前途的研究。为了支持这项工作的发现,本文旨在提供基于变压器的RL方法的附加示例。具体而言,该目标是一种基于变压器的简单深度Q学习方法,在几种环境上是稳定的。由于变压器和RL的不稳定性,进行了广泛的方法搜索,以达到一种最终的方法,该方法利用了围绕变形金刚和Q学习的发展。所提出的方法可以匹配经典Q学习在控制环境上的性能,同时在某些选定的ATARI基准上显示潜力。此外,对它进行了严格的评估,以提供有关变压器与RL之间关系的更多见解。
Since the publication of the original Transformer architecture (Vaswani et al. 2017), Transformers revolutionized the field of Natural Language Processing. This, mainly due to their ability to understand timely dependencies better than competing RNN-based architectures. Surprisingly, this architecture change does not affect the field of Reinforcement Learning (RL), even though RNNs are quite popular in RL, and time dependencies are very common in RL. Recently, Parisotto et al. 2019) conducted the first promising research of Transformers in RL. To support the findings of this work, this paper seeks to provide an additional example of a Transformer-based RL method. Specifically, the goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments. Due to the unstable nature of Transformers and RL, an extensive method search was conducted to arrive at a final method that leverages developments around Transformers as well as Q-learning. The proposed method can match the performance of classic Q-learning on control environments while showing potential on some selected Atari benchmarks. Furthermore, it was critically evaluated to give additional insights into the relation between Transformers and RL.