关于平均马尔可夫决策过程的最佳固定政策的存在

论文标题

关于平均马尔可夫决策过程的最佳固定政策的存在

On the existence of optimal stationary policies for average Markov decision processes with countable states

论文作者

Xia, Li, Guo, Xianping, Cao, Xi-Ren

论文摘要

对于具有无数状态的马尔可夫决策过程，最佳价值在一组固定政策中可能无法实现。在本文中，我们在长期平均标准下研究了可数州马尔可夫决策过程中最佳固定政策的存在条件。通过在沿着沿着千古的MDP的政策空间上正确定义的度量，可以通过空间的紧凑性以及相对于度量标准的长期平均成本的连续性来确保最佳的平稳政策的存在。我们通过一些假设进一步扩展了这种情况，这些假设可以在特定系统的控制问题（例如排队系统的控制问题）中进行验证。我们的结果为文献做出了互补的贡献，即我们的方法能够处理与下方和更高的成本函数的处理，仅在连续性和真诚的条件下。提供了几个示例来说明我们的主要结果的应用。

For a Markov decision process with countably infinite states, the optimal value may not be achievable in the set of stationary policies. In this paper, we study the existence conditions of an optimal stationary policy in a countable-state Markov decision process under the long-run average criterion. With a properly defined metric on the policy space of ergodic MDPs, the existence of an optimal stationary policy can be guaranteed by the compactness of the space and the continuity of the long-run average cost with respect to the metric. We further extend this condition by some assumptions which can be easily verified in control problems of specific systems, such as queueing systems. Our results make a complementary contribution to the literature in the sense that our method is capable to handle the cost function unbounded from both below and above, only at the condition of continuity and ergodicity. Several examples are provided to illustrate the application of our main results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题