论文标题
基于扰动的探索方法
Perturbation-based exploration methods in deep reinforcement learning
论文作者
论文摘要
对结构化探索的最新研究将重点放在确定状态空间中的新状态,并激励代理人通过固有的奖励奖金重新审视它们。在这项研究中,我们质疑通过这些方法证明的绩效提升是否确实是由于代理商的探索时间表中发现了结构,还是在很大程度上归因于在追求结构化探索的政策和奖励空间中的扰动。在这项研究中,我们研究了扰动在政策和奖励空间中对代理商探索行为的影响。我们开始表明,在软磁层层和将零星奖励奖金引入该领域之前的策略的简单行为可以极大地增强街机学习环境的多个领域的探索。鉴于这些发现,我们建议在嘈杂探索的背景下对结构化勘探研究进行基准测试。
Recent research on structured exploration placed emphasis on identifying novel states in the state space and incentivizing the agent to revisit them through intrinsic reward bonuses. In this study, we question whether the performance boost demonstrated through these methods is indeed due to the discovery of structure in exploratory schedule of the agent or is the benefit largely attributed to the perturbations in the policy and reward space manifested in pursuit of structured exploration. In this study we investigate the effect of perturbations in policy and reward spaces on the exploratory behavior of the agent. We proceed to show that simple acts of perturbing the policy just before the softmax layer and introduction of sporadic reward bonuses into the domain can greatly enhance exploration in several domains of the arcade learning environment. In light of these findings, we recommend benchmarking any enhancements to structured exploration research against the backdrop of noisy exploration.