连续观察空间的在线POMDP求解器

论文标题

连续观察空间的在线POMDP求解器

An On-Line POMDP Solver for Continuous Observation Spaces

论文作者

Hoerger, Marcus, Kurniawati, Hanna

论文摘要

在偏度下进行计划对于自动机器人至关重要。解决此类计划问题的原则方法是可观察到的马尔可夫决策过程（POMDP）。尽管解决POMDP在计算上是棘手的，但在过去二十年中，在开发近似POMDP求解器方面取得了重大进步。但是，计算持续观察空间问题的强大解决方案仍然具有挑战性。大多数在线求解器都依赖于离散的观察空间，或人为地限制了计划在计算易聊天策略期间所考虑的观测值的数量。在本文中，我们提出了一种新的在线POMDP求解器，称为连续POMDPS（LabeCop）的懒惰信念提取，该求解结合了Monte-Carlo-Tree-Search和粒子过滤的方法，以构建不需要离散化的观察空间的政策修复，并避免限制在计划过程中考虑的观察次数。在涉及连续观察空间的三个不同问题上进行的实验表明，LabeCop的执行效果与最先进的POMDP求解器相似或更好。

Planning under partial obervability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for problems with continuous observation spaces remains challenging. Most on-line solvers rely on discretising the observation space or artificially limiting the number of observations that are considered during planning to compute tractable policies. In this paper we propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP), that combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces and avoids limiting the number of observations considered during planning. Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题