论文标题
崩溃的土匪及其在公共卫生干预措施中的应用
Collapsing Bandits and Their Application to Public Health Interventions
论文作者
论文摘要
我们提出和研究碰撞匪徒,这是一种新的不安的多武器匪徒(RMAB)设置,在该设置中,每个手臂都遵循具有特殊结构的二进制马尔可夫过程:当播放手臂时,状态会完全观察到,因此“崩溃”了任何不确定性,但是当手臂是被动的,因此没有观察到,因此,没有观察到不确定的不确定性。目的是通过计划每轮行动预算有限,将尽可能多的武器保持在“良好”状态。这种崩溃的匪徒是许多医疗领域的天然模型,工人必须同时监测患者并以最大化患者同伴健康的方式提供干预措施。我们的主要贡献如下:(i)基于RMAB的Whittle索引技术的构建,我们得出了可倒塌的匪徒问题的条件。我们的派生取决于新的条件,这些新条件是何时最佳策略可能采用“正向”或“反向”阈值策略的形式。 (ii)我们利用阈值策略的最佳性,以构建用于计算晶体指数(包括封闭形式)的快速算法。 (iii)我们评估了几个数据分布的算法,包括来自现实世界中医疗保健任务的数据,其中工人必须监测和提供干预措施,以最大程度地遵守结核病药物。与最先进的RMAB技术相比,我们的算法达到了3阶速度的速度,同时达到了相似的性能。
We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques while achieving similar performance.