一项风险意识的多军匪徒的调查

论文标题

一项风险意识的多军匪徒的调查

A Survey of Risk-Aware Multi-Armed Bandits

论文作者

Tan, Vincent Y. F., A., Prashanth L., Jagannathan, Krishna

论文摘要

在临床试验和金融投资组合优化等几种应用中，期望值（或平均奖励）并不能令人满意地捕获药物或投资组合的优点。在此类应用中，风险起着至关重要的作用，并且优选风险感知的绩效指标，以便在不良事件的情况下捕获损失。这项调查旨在合并和总结有关风险措施的现有研究，特别是在多军匪徒的背景下。我们回顾了各种感兴趣的风险度量，并评论其财产。接下来，我们审查各种风险措施的现有集中度不平等。然后，我们开始定义风险感知的匪徒问题，我们考虑了遗憾最小化设置的算法，探索探索折衷的权衡表现出来以及最佳武器识别设置，这是一个纯粹的探索问题 - 既在风险敏感措施的背景下。最后，我们通过评论持续的挑战和肥沃的领域，以供未来的研究。

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise the existing research on risk measures, specifically in the context of multi-armed bandits. We review various risk measures of interest, and comment on their properties. Next, we review existing concentration inequalities for various risk measures. Then, we proceed to defining risk-aware bandit problems, We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests, as well as the best-arm identification setting, which is a pure exploration problem -- both in the context of risk-sensitive measures. We conclude by commenting on persisting challenges and fertile areas for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题