有限状态马尔可夫连锁店的不平等现象及其在马尔可夫土匪上的应用

论文标题

有限状态马尔可夫连锁店的不平等现象及其在马尔可夫土匪上的应用

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits

论文作者

Moulos, Vrettos

论文摘要

本文对部分总和造成了$ \ sum_ {k = 1}^n f（x_k）$的不平等，其中$ \ {x_k \} _ {k \ in \ in \ mathbb {z} _ {z} _ {> 0}}} $是$ s $ s $ s $ s $ s $ s $ s $ s s $ s s $ s，s s s $ s，实值函数。我们的界限很简单，一般，因为它仅假定状态空间的不可约性和有限性。为了证明其有用性，我们在多军匪徒问题中提供了两种应用。第一个是要确定大约最佳的马尔可夫臂，而第二个则关注马尔可夫·匪徒的遗憾最小化。

This paper develops a Hoeffding inequality for the partial sums $\sum_{k=1}^n f (X_k)$, where $\{X_k\}_{k \in \mathbb{Z}_{> 0}}$ is an irreducible Markov chain on a finite state space $S$, and $f : S \to [a, b]$ is a real-valued function. Our bound is simple, general, since it only assumes irreducibility and finiteness of the state space, and powerful. In order to demonstrate its usefulness we provide two applications in multi-armed bandit problems. The first is about identifying an approximately best Markovian arm, while the second is concerned with regret minimization in the context of Markovian bandits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题