论文标题
比较公平排名指标
Comparing Fair Ranking Metrics
论文作者
论文摘要
信息检索系统(IR)系统经常使用排名列表,以提出与用户信息需求相关的结果。公平性是这些排名中相对较新但重要的方面,可以衡量这些排名,加入了一系列丰富的指标,这些指标超出了传统的准确性或实用性构造,以提供对IR系统行为的整体理解。在过去的几年中,已经提出了几种指标来量化排名的(联合国)公平性,尤其是关于内容提供者的特定组,但缺乏对这些指标的比较分析,尤其是对于IR的比较分析。因此,有限的指导是确定哪些公平指标适用于特定方案,或评估指标同意或不同意对真实数据应用的程度。在本文中,我们用共同的符号描述了现有文献中的几个公平排名指标,从而可以直接比较其假设,目标和设计选择。然后,我们在涵盖搜索任务和建议任务的多个数据集上进行经验比较它们。
Ranked lists are frequently used by information retrieval (IR) systems to present results believed to be relevant to the users information need. Fairness is a relatively new but important aspect of these rankings to measure, joining a rich set of metrics that go beyond traditional accuracy or utility constructs to provide a more holistic understanding of IR system behavior. In the last few years, several metrics have been proposed to quantify the (un)fairness of rankings, particularly with respect to particular group(s) of content providers, but comparative analyses of these metrics -- particularly for IR -- is lacking. There is limited guidance, therefore, to decide what fairness metrics are applicable to a specific scenario, or assessment of the extent to which metrics agree or disagree applied to real data. In this paper, we describe several fair ranking metrics from existing literature in a common notation, enabling direct comparison of their assumptions, goals, and design choices; we then empirically compare them on multiple data sets covering both search and recommendation tasks.