论文标题
基于频率的搜索控制DYNA
Frequency-based Search-control in Dyna
论文作者
论文摘要
基于模型的增强学习已被经验证明是提高样本效率的成功策略。尤其是Dyna是一种基于模型的典型架构,该体系结构集成了学习和计划,可提供使用模型的巨大灵活性。 DYNA中最重要的组成部分之一称为搜索控制,它指的是生成状态或状态行动对的过程,我们从中查询模型以获取模拟体验。搜索控制对于提高学习效率至关重要。在这项工作中,我们通过搜索值函数的高频区域提出了一种简单而新颖的搜索控制策略。我们的主要直觉建立在信号处理中的香农采样定理上,这表明高频信号需要更多的样本来重建。我们从经验上表明,高频函数更难近似。这表明了搜索控制策略:我们应该使用价值函数高频区域的状态来查询模型以获取更多样本。我们制定了一种简单的策略,可以通过梯度和黑森西亚规范在局部测量功能的频率,并为此方法提供理论上的理由。然后,我们将策略应用于DYNA中的搜索控制,并进行实验以显示其对基准域的特性和有效性。
Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Search-control is critical in improving learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency regions of the value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.