论文标题
定性多目标可及以订购的分支MDP
Qualitative Multi-Objective Reachability for Ordered Branching MDPs
论文作者
论文摘要
我们研究有序分支马尔可夫决策过程(OBMDPS)或等效的无上下文MDP的定性多目标可及性问题,这是基于分支马尔可夫决策过程(BMDPS)的单目标可及性的先验结果。 我们为OBMDPS提供了两种单独的算法,用于“几乎是”和“极限”多目标可达性。具体而言,给定一个obmdp,$ \ natercal {a} $,给定一个起始非终端,并给定一组目标非末端$ k $ size $ k = | k | $,我们的第一个算法决定了至上的可能性,即在每一个目标非终端的树中,是否包含一个非终端的树中的每个目标non-term interminal in Bet seet neT seet套装$ k $ $ k $ $ 1 $ $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $。我们的第二种算法决定了玩家几乎有策略(概率$ 1 $)会产生一棵树,该树包含套装$ k $中的每个目标非终端。 需要这两个单独的算法:我们确实在这种情况下表明,“几乎是“ $ \ not = $”的限制限制“限制”,对于多目标可及性,这意味着播放器可能没有任何策略来实现概率$ 1 $的概率$ 1 $的概率,但可以在同一生成的树中获得$ 1 $ 1的概率,但可以实现$ 1 $ 1 $ 1的unionce $ 1的概率。两种算法在时间上运行$ 2^{o(k)} \ cdot | \ mathcal {a} |^{o(1)} $,其中$ | \ nathcal {a} | $是给定obmdp,$ \ mathcal {a} $的总数编码。因此,当$ k $固定时,它们在多项式时间内运行,并且相对于$ k $是固定参数。此外,我们表明,即使是定性的几乎是固定的(且限制)的多目标可及性决策问题也是NP-HARD的一般,而目标非终端的$ k $的尺寸$ k $没有固定。
We study qualitative multi-objective reachability problems for Ordered Branching Markov Decision Processes (OBMDPs), or equivalently context-free MDPs, building on prior results for single-target reachability on Branching Markov Decision Processes (BMDPs). We provide two separate algorithms for "almost-sure" and "limit-sure" multi-target reachability for OBMDPs. Specifically, given an OBMDP, $\mathcal{A}$, given a starting non-terminal, and given a set of target non-terminals $K$ of size $k = |K|$, our first algorithm decides whether the supremum probability, of generating a tree that contains every target non-terminal in set $K$, is $1$. Our second algorithm decides whether there is a strategy for the player to almost-surely (with probability $1$) generate a tree that contains every target non-terminal in set $K$. The two separate algorithms are needed: we show that indeed, in this context, "almost-sure" $\not=$ "limit-sure" for multi-target reachability, meaning that there are OBMDPs for which the player may not have any strategy to achieve probability exactly $1$ of reaching all targets in set $K$ in the same generated tree, but may have a sequence of strategies that achieve probability arbitrarily close to $1$. Both algorithms run in time $2^{O(k)} \cdot |\mathcal{A}|^{O(1)}$, where $|\mathcal{A}|$ is the total bit encoding length of the given OBMDP, $\mathcal{A}$. Hence they run in polynomial time when $k$ is fixed, and are fixed-parameter tractable with respect to $k$. Moreover, we show that even the qualitative almost-sure (and limit-sure) multi-target reachability decision problem is in general NP-hard, when the size $k$ of the set $K$ of target non-terminals is not fixed.