深Q代理商学到的表示形式如何转移？

论文标题

深Q代理商学到的表示形式如何转移？

How Transferable are the Representations Learned by Deep Q Agents?

论文作者

Tyo, Jacob, Lipton, Zachary

论文摘要

在本文中，我们考虑了深度加固学习（DRL）的样本复杂性的来源，询问从学习有用的环境状态的有用表示的要求以及由于学习策略的样本复杂性所致的要求。对于DRL代理人而言，代表和政策之间的区别可能不清楚，但我们通过一系列转移学习实验寻求新的见解。在每个实验中，我们都保留了在同一游戏或相关游戏中训练的一小部分层，将转移学习的好处与从头开始学习政策的好处。有趣的是，我们发现，由于转移而产生的收益通常是高度可变的，并且在各对任务之间是不对称的。我们的实验表明，从更简单的环境中转移可以提高更复杂的下游任务的性能，并且学习有用的表示的要求范围从可忽略到可忽略到基于环境的大多数样本复杂性。此外，我们发现，通过冻结的转移层的微调通常优于训练，这证实了分类环境中首先指出的见解。

In this paper, we consider the source of Deep Reinforcement Learning (DRL)'s sample complexity, asking how much derives from the requirement of learning useful representations of environment states and how much is due to the sample complexity of learning a policy. While for DRL agents, the distinction between representation and policy may not be clear, we seek new insight through a set of transfer learning experiments. In each experiment, we retain some fraction of layers trained on either the same game or a related game, comparing the benefits of transfer learning to learning a policy from scratch. Interestingly, we find that benefits due to transfer are highly variable in general and non-symmetric across pairs of tasks. Our experiments suggest that perhaps transfer from simpler environments can boost performance on more complex downstream tasks and that the requirements of learning a useful representation can range from negligible to the majority of the sample complexity, based on the environment. Furthermore, we find that fine-tuning generally outperforms training with the transferred layers frozen, confirming an insight first noted in the classification setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题