Voronoi渐进式扩大：有效的在线求解器，用于连续状态，行动和观察POMDPS

论文标题

Voronoi渐进式扩大：有效的在线求解器，用于连续状态，行动和观察POMDPS

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

论文作者

Lim, Michael H., Tomlin, Claire J., Sunberg, Zachary N.

论文摘要

本文介绍了Voronoi渐进式扩大（VPW），Voronoi乐观优化（VOO）的概括和动作逐步扩大到部分可观察到的马尔可夫决策过程（POMDPS）。树搜索算法可以使用VPW通过有效平衡本地和全局动作搜索来有效地处理连续或混合动作空间。本文提出了两种基于VPW的算法，并从理论和模拟的角度分析了它们。 Voronoi乐观的加权稀疏采样（VOWSS）是一种理论工具，可证明基于VPW的在线求解器合理，它是第一个具有连续状态，动作和观察POMDP的全局收敛保证的算法。 Voronoi乐观的蒙特卡洛计划与观察加权（VOMCPOW）是一种多功能且有效的算法，在几个模拟实验中始终超过最先进的POMDP算法。

This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题