论文标题
线性函数近似弱的大型MDP的有效计划
Efficient Planning in Large MDPs with Weak Linear Function Approximation
论文作者
论文摘要
大规模的马尔可夫决策过程(MDP)需要计划算法,其运行时独立于MDP状态。我们使用线性值函数近似值仅具有弱需求:最佳值函数的低近似误差以及一小部分“核心”状态,其特征跨越其他状态的状态。特别是,我们对非最佳策略的策略或价值函数的代表性没有任何假设。我们的算法使用生成甲骨文(模拟器)为MDP生成几乎最佳的动作,而其计算时间则以特征,核心状态和动作和有效范围的数量为单位缩放。
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.