定性多目标可及以订购的分支MDP

论文标题

定性多目标可及以订购的分支MDP

Qualitative Multi-Objective Reachability for Ordered Branching MDPs

论文作者

Etessami, Kousha, Martinov, Emanuel

论文摘要

我们研究有序分支马尔可夫决策过程（OBMDPS）或等效的无上下文MDP的定性多目标可及性问题，这是基于分支马尔可夫决策过程（BMDPS）的单目标可及性的先验结果。我们为OBMDPS提供了两种单独的算法，用于“几乎是”和“极限”多目标可达性。具体而言，给定一个obmdp，$ \ natercal {a} $，给定一个起始非终端，并给定一组目标非末端$ k $ size $ k = | k | $，我们的第一个算法决定了至上的可能性，即在每一个目标非终端的树中，是否包含一个非终端的树中的每个目标non-term interminal in Bet seet neT seet套装$ k $ $ k $ $ 1 $ $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $ 1 $。我们的第二种算法决定了玩家几乎有策略（概率$ 1 $）会产生一棵树，该树包含套装$ k $中的每个目标非终端。需要这两个单独的算法：我们确实在这种情况下表明，“几乎是“ $ \ not = $”的限制限制“限制”，对于多目标可及性，这意味着播放器可能没有任何策略来实现概率$ 1 $的概率$ 1 $的概率，但可以在同一生成的树中获得$ 1 $ 1的概率，但可以实现$ 1 $ 1 $ 1的unionce $ 1的概率。两种算法在时间上运行$ 2^{o（k）} \ cdot | \ mathcal {a} |^{o（1）} $，其中$ | \ nathcal {a} | $是给定obmdp，$ \ mathcal {a} $的总数编码。因此，当$ k $固定时，它们在多项式时间内运行，并且相对于$ k $是固定参数。此外，我们表明，即使是定性的几乎是固定的（且限制）的多目标可及性决策问题也是NP-HARD的一般，而目标非终端的$ k $的尺寸$ k $没有固定。

We study qualitative multi-objective reachability problems for Ordered Branching Markov Decision Processes (OBMDPs), or equivalently context-free MDPs, building on prior results for single-target reachability on Branching Markov Decision Processes (BMDPs). We provide two separate algorithms for "almost-sure" and "limit-sure" multi-target reachability for OBMDPs. Specifically, given an OBMDP, $\mathcal{A}$, given a starting non-terminal, and given a set of target non-terminals $K$ of size $k = |K|$, our first algorithm decides whether the supremum probability, of generating a tree that contains every target non-terminal in set $K$, is $1$. Our second algorithm decides whether there is a strategy for the player to almost-surely (with probability $1$) generate a tree that contains every target non-terminal in set $K$. The two separate algorithms are needed: we show that indeed, in this context, "almost-sure" $\not=$ "limit-sure" for multi-target reachability, meaning that there are OBMDPs for which the player may not have any strategy to achieve probability exactly $1$ of reaching all targets in set $K$ in the same generated tree, but may have a sequence of strategies that achieve probability arbitrarily close to $1$. Both algorithms run in time $2^{O(k)} \cdot |\mathcal{A}|^{O(1)}$, where $|\mathcal{A}|$ is the total bit encoding length of the given OBMDP, $\mathcal{A}$. Hence they run in polynomial time when $k$ is fixed, and are fixed-parameter tractable with respect to $k$. Moreover, we show that even the qualitative almost-sure (and limit-sure) multi-target reachability decision problem is in general NP-hard, when the size $k$ of the set $K$ of target non-terminals is not fixed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题