在躁动不安的多武器匪徒的柔软公平

论文标题

在躁动不安的多武器匪徒的柔软公平

Towards Soft Fairness in Restless Multi-Armed Bandits

论文作者

Li, Dexun, Varakantham, Pradeep

论文摘要

躁动不安的多臂土匪（RMAB）是在不确定性下分配有限资源的框架。这是一个非常有用的模型，用于监测受益人和执行及时的干预措施，以确保在公共卫生环境中获得最大的利益（例如，确保患者在结核病环境中服用药物，确保怀孕的母亲听取有关良好怀孕习惯的自动电话）。由于资源有限，通常某些社区或地区会饿死可能带来后续影响的干预措施。为了避免在个人/地区/社区的执行干预措施中饥饿，我们首先提供了柔和的公平限制，然后提供了一种方法来强制RMAB中的软性公平约束。柔软的公平约束要求，如果选择后者的长期累积奖励更高，则算法永远不会偏爱另一只臂而不是另一只手臂。我们的方法将基于SoftMax的价值迭代方法在RMAB设置中包含到设法满足拟议公平约束的设计选择算法中。我们的方法（称为SoftFair）也提供了理论性能保证，并且在渐近上是最佳的。最后，我们证明了我们在模拟基准上的方法的实用性，并表明可以在没有重大牺牲的价值牺牲的情况下处理柔软的公平约束。

Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty. It is an extremely useful model for monitoring beneficiaries and executing timely interventions to ensure maximum benefit in public health settings (e.g., ensuring patients take medicines in tuberculosis settings, ensuring pregnant mothers listen to automated calls about good pregnancy practices). Due to the limited resources, typically certain communities or regions are starved of interventions that can have follow-on effects. To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs. The soft fairness constraint requires that an algorithm never probabilistically favor one arm over another if the long-term cumulative reward of choosing the latter arm is higher. Our approach incorporates softmax based value iteration method in the RMAB setting to design selection algorithms that manage to satisfy the proposed fairness constraint. Our method, referred to as SoftFair, also provides theoretical performance guarantees and is asymptotically optimal. Finally, we demonstrate the utility of our approaches on simulated benchmarks and show that the soft fairness constraint can be handled without a significant sacrifice on value.

下载PDF全文

下载文献需遵守相关版权规定

论文标题