自适应在线价值功能函数近似与小波

论文标题

自适应在线价值功能函数近似与小波

Adaptive Online Value Function Approximation with Wavelets

论文作者

Beukman, Michael, Mitchley, Michael, Wookey, Dean, James, Steven, Konidaris, George

论文摘要

对于连续和高维状态空间，使用函数近似来表示价值函数。线性函数近似具有理想的保证，并且通常需要比神经网络更少的计算和样本，但是随着状态空间的维度增加，功能数量的指数增长都呈指数增长。在这项工作中，我们介绍了强化学习的小波基础。小波可以有效地用作固定基础，并提供随着学习的进展而适应性地完善基集的能力，从而使其从最小的基集开始。这种自适应方法可以增加状态空间点处的近似值的粒度，也可以在必要时增加不同维度之间的相互作用。如果我们希望构建一个可以自适应精炼而不会损失精确度的函数近似器，我们证明小波是必要且足够的。我们进一步证明，固定的小波基集与在山车和杂技演员上的高性能傅立叶基础相对表现相当，并且自适应方法提供了一种方便的方法来解决超大的初始基集，同时证明性能与固定波形相当或大于固定波形的基础。

Using function approximation to represent a value function is necessary for continuous and high-dimensional state spaces. Linear function approximation has desirable theoretical guarantees and often requires less compute and samples than neural networks, but most approaches suffer from an exponential growth in the number of functions as the dimensionality of the state space increases. In this work, we introduce the wavelet basis for reinforcement learning. Wavelets can effectively be used as a fixed basis and additionally provide the ability to adaptively refine the basis set as learning progresses, making it feasible to start with a minimal basis set. This adaptive method can either increase the granularity of the approximation at a point in state space, or add in interactions between different dimensions as necessary. We prove that wavelets are both necessary and sufficient if we wish to construct a function approximator that can be adaptively refined without loss of precision. We further demonstrate that a fixed wavelet basis set performs comparably against the high-performing Fourier basis on Mountain Car and Acrobot, and that the adaptive methods provide a convenient approach to addressing an oversized initial basis set, while demonstrating performance comparable to, or greater than, the fixed wavelet basis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题