从部分观察中学习的强化学习：可证明样品效率的线性函数近似

论文标题

从部分观察中学习的强化学习：可证明样品效率的线性函数近似

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

论文作者

Cai, Qi, Yang, Zhuoran, Wang, Zhaoran

论文摘要

我们研究了具有无限观察和状态空间的部分观察到的马尔可夫决策过程（POMDP）的强化学习，理论上仍然不太研究。为此，我们首次尝试桥接具有线性结构的一类POMDP的部分可观察性和功能近似。详细说明，我们提出了一种增强学习算法（通过对抗性积分方程或操作装饰的乐观探索），该算法在$ O（1/ε^2）$情节中达到$ε$ - 最佳政策。特别是，样品复杂性在线性结构的固有维度上缩放，并且独立于观测和状态空间的大小。 The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which features a smoothed discriminator tailored to the linear structure, and (iii) the exploration of the observation and state spaces via optimism, which is based on quantifying the对抗积分方程的不确定性。

We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial observability and function approximation for a class of POMDPs with a linear structure. In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an $ε$-optimal policy within $O(1/ε^2)$ episodes. In particular, the sample complexity scales polynomially in the intrinsic dimension of the linear structure and is independent of the size of the observation and state spaces. The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which features a smoothed discriminator tailored to the linear structure, and (iii) the exploration of the observation and state spaces via optimism, which is based on quantifying the uncertainty in the adversarial integral equation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题