论文标题

具有积极外部性的网络不安的土匪

Networked Restless Bandits with Positive Externalities

论文作者

Herlihy, Christine, Dickerson, John P.

论文摘要

不安的多武器匪徒通常用于建模预算受限的资源分配任务,其中收到资源与有利状态过渡的可能性增加有关。先前的工作假定只有直接收到资源时,只有单个武器受益。但是,许多分配任务发生在社区内,可以以积极的外部性为特征,这些外部性允许武器在邻居接收资源时获得部分利益。因此,我们引入了网络不安的土匪,这是一种新型的多武器匪徒,在该设置中,手臂既不躁动又嵌入在有向图内。然后,我们提出了Greta,这是一种基于图形的,基于质指数的启发式算法,可用于在每个时间步中有效构建受约束的奖励最大化动作向量。我们的经验结果表明,GRETA的表现优于一系列超参数值和图形拓扑的比较策略。

Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition. Prior work assumes that individual arms only benefit if they receive the resource directly. However, many allocation tasks occur within communities and can be characterized by positive externalities that allow arms to derive partial benefit when their neighbor(s) receive the resource. We thus introduce networked restless bandits, a novel multi-armed bandit setting in which arms are both restless and embedded within a directed graph. We then present Greta, a graph-aware, Whittle index-based heuristic algorithm that can be used to efficiently construct a constrained reward-maximizing action vector at each timestep. Our empirical results demonstrate that Greta outperforms comparison policies across a range of hyperparameter values and graph topologies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源