论文标题

打击马尔可夫决策过程

Hitting time for Markov decision process

论文作者

Jiang, Ruichao, Tavakoli, Javad, Zhao, Yiqinag

论文摘要

我们定义了马尔可夫决策过程(MDP)的打击时间。我们不使用MDP引起的马尔可夫过程的打击时间,因为诱导的链可能没有固定分布。即使它具有固定分布,固定分布也可能与MDP的(归一化)占用度度量一致。我们观察到MDP与Pagerank之间的关系。使用此观察结果,我们构建了一个MP,其固定分布与MDP的归一化占用度量重合,并将MDP的击球时间定义为相关MP的打击时间。

We define the hitting time for a Markov decision process (MDP). We do not use the hitting time of the Markov process induced by the MDP because the induced chain may not have a stationary distribution. Even it has a stationary distribution, the stationary distribution may not coincide with the (normalized) occupancy measure of the MDP. We observe a relationship between the MDP and the PageRank. Using this observation, we construct an MP whose stationary distribution coincides with the normalized occupancy measure of the MDP and we define the hitting time of the MDP as the hitting time of the associated MP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源