论文标题
镜下降和信息比
Mirror Descent and the Information Ratio
论文作者
论文摘要
我们在Russo和Van Roy [2014]的信息比之间建立了联系。我们的分析表明,与适当的损失估计器和探索性分布相同的镜像下降对对抗性的遗憾也相同,与贝叶斯遗憾的贝叶斯对信息指导的采样相同。在此过程中,我们开发了用于信息定向采样的理论,并为对抗性匪徒提供了有效的算法,遗憾的上限与最知名的信息理论上限完全匹配。
We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound.