记忆的行为：一项在可观察到的强化学习中的研究

论文标题

记忆的行为：一项在可观察到的强化学习中的研究

The act of remembering: a study in partially observable reinforcement learning

论文作者

Icarte, Rodrigo Toro, Valenzano, Richard, Klassen, Toryn Q., Christoffersen, Phillip, Farahmand, Amir-massoud, McIlraith, Sheila A.

论文摘要

强化学习（RL）代理通常会学习无内存的政策 - - 仅在选择动作时考虑最后观察的政策。在完全可观察到的环境中，学习无内存政策是有效且最佳的。但是，当RL代理面临部分可观察性时，需要某种形式的内存。在本文中，我们研究了一种轻巧的方法来解决RL中的部分可观察性。我们为代理提供外部内存和其他操作，以控制记忆中的内容（如果有的话）。在每个步骤中，当前的内存状态是代理观察的一部分，并且代理选择一个元组动作：一种修改环境的动作，另一个可修改内存的动作。当外部内存足够表达时，最佳的无内存策略会产生全球最佳解决方案。不幸的是，以前以二进制记忆形式使用外部记忆的尝试在实践中产生了不良的结果。在这里，我们研究了替代形式的记忆形式，以支持学习有效的无记忆政策。我们的新型内存形式在良好的部分可观察的域中优于基于二进制的记忆和基于LSTM的内存。

Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observability in RL. We provide the agent with an external memory and additional actions to control what, if anything, is written to the memory. At every step, the current memory state is part of the agent's observation, and the agent selects a tuple of actions: one action that modifies the environment and another that modifies the memory. When the external memory is sufficiently expressive, optimal memoryless policies yield globally optimal solutions. Unfortunately, previous attempts to use external memory in the form of binary memory have produced poor results in practice. Here, we investigate alternative forms of memory in support of learning effective memoryless policies. Our novel forms of memory outperform binary and LSTM-based memory in well-established partially observable domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题