强大的主动偏好引起

论文标题

强大的主动偏好引起

Robust Active Preference Elicitation

论文作者

Vayanos, Phebe, Ye, Yingxiao, McElfresh, Duncan, Dickerson, John, Rice, Eric

论文摘要

我们研究了通过中等数量的成对比较查询引起决策者的偏好的问题，以使其成为特定问题的高质量建议。我们受到高风险领域中的应用的激励，例如，在选择分配稀缺资源以满足基本需求的政策时（例如，为无家可归的人进行移植或住房的肾脏），需要从（部分）引起的偏好中提出结果建议。我们将偏好中的不确定性建模为基于设置的不确定性，并}调查两个设置：a）一个离线启发设置，其中所有查询均立即进行，b）在线启发设置，其中以自适应方式依次选择了查询。我们提出了这些问题的强大优化公式，这些公式将偏好启发和推荐阶段整合到最大化最坏情况的效用或最小化最坏情况的遗憾并研究其复杂性。对于脱机情况，如果主动偏好启发采用与决策相关信息发现的两半阶段强大的优化问题的形式，我们以混合二进制线性程序的形式提供了等效的重新重新制定，我们通过列和构造生成求解。对于在线设置，主动偏好学习采用与决策相关信息发现的多阶段强大优化问题的形式，我们提出了一种保守的解决方案方法。关于合成数据的数值研究表明，我们的方法在最差的案例，遗憾和效用方面，我们的方法表现优于文献的最先进方法。我们展示了如何使用我们的方法来帮助无家可归的服务机构选择一项政策，以将不同类型的稀缺住房资源分配给经历无家可归者的人。

We study the problem of eliciting the preferences of a decision-maker through a moderate number of pairwise comparison queries to make them a high quality recommendation for a specific problem. We are motivated by applications in high stakes domains, such as when choosing a policy for allocating scarce resources to satisfy basic needs (e.g., kidneys for transplantation or housing for those experiencing homelessness) where a consequential recommendation needs to be made from the (partially) elicited preferences. We model uncertainty in the preferences as being set based and} investigate two settings: a) an offline elicitation setting, where all queries are made at once, and b) an online elicitation setting, where queries are selected sequentially over time in an adaptive fashion. We propose robust optimization formulations of these problems which integrate the preference elicitation and recommendation phases with aim to either maximize worst-case utility or minimize worst-case regret, and study their complexity. For the offline case, where active preference elicitation takes the form of a two and half stage robust optimization problem with decision-dependent information discovery, we provide an equivalent reformulation in the form of a mixed-binary linear program which we solve via column-and-constraint generation. For the online setting, where active preference learning takes the form of a multi-stage robust optimization problem with decision-dependent information discovery, we propose a conservative solution approach. Numerical studies on synthetic data demonstrate that our methods outperform state-of-the art approaches from the literature in terms of worst-case rank, regret, and utility. We showcase how our methodology can be used to assist a homeless services agency in choosing a policy for allocating scarce housing resources of different types to people experiencing homelessness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题