理解和防止强化学习的能力损失

论文标题

理解和防止强化学习的能力损失

Understanding and Preventing Capacity Loss in Reinforcement Learning

论文作者

Lyle, Clare, Rowland, Mark, Dabney, Will

论文摘要

强化学习（RL）问题泛滥成灾，这是非平稳性的来源，这使其成为应用神经网络的困难问题领域。我们确定了一种机制，通过该机制，非平稳的预测目标可以防止深入RL代理中的学习进度：\ textit {容量损失}，从而通过一系列目标值训练的网络失去了他们快速更新预测随着时间的推移的能力。我们证明了容量损失发生在一系列RL代理和环境中，并且在稀疏奖励任务中的性能尤其损害。然后，我们提出了一个简单的正规化器，初始特征正规化（推断），通过将特征的子空间重新降低到初始化时的价值，从而减轻了这种现象，从而导致稀疏奖励环境（例如蒙特祖玛的报仇）的绩效改善。我们得出的结论是，预防能力损失对于使代理能够从整个训练轨迹中获得的学习信号从最大程度上受益至关重要。

The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents: \textit{capacity loss}, whereby networks trained on a sequence of target values lose their ability to quickly update their predictions over time. We demonstrate that capacity loss occurs in a range of RL agents and environments, and is particularly damaging to performance in sparse-reward tasks. We then present a simple regularizer, Initial Feature Regularization (InFeR), that mitigates this phenomenon by regressing a subspace of features towards its value at initialization, leading to significant performance improvements in sparse-reward environments such as Montezuma's Revenge. We conclude that preventing capacity loss is crucial to enable agents to maximally benefit from the learning signals they obtain throughout the entire training trajectory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题