流式传输最大化的公平性：算法和硬度

论文标题

流式传输最大化的公平性：算法和硬度

Fairness in Streaming Submodular Maximization: Algorithms and Hardness

论文作者

Halabi, Marwa El, Mitrović, Slobodan, Norouzi-Fard, Ashkan, Tardos, Jakab, Tarnawski, Jakub

论文摘要

子管道最大化已成为选择代表性和多样化数据摘要的任务的首选方法。但是，如果数据点具有敏感的属性，例如性别或年龄，则已知的机器学习算法将表现出偏见：特定组的偏见或过分代表。这使公平机器学习算法的设计越来越重要。在这项工作中，我们解决了一个问题：是否可以为大型数据集创建公平的摘要？为此，我们开发了在公平性约束下，用于单调和非单调函数的第一个流式近似算法。我们在经验上验证了基于示例的聚类，电影推荐，基于DPP的摘要以及社交网络中最大覆盖范围的发现，表明公平限制并不会显着影响效用。

Submodular maximization has become established as the method of choice for the task of selecting representative and diverse summaries of data. However, if datapoints have sensitive attributes such as gender or age, such machine learning algorithms, left unchecked, are known to exhibit bias: under- or over-representation of particular groups. This has made the design of fair machine learning algorithms increasingly important. In this work we address the question: Is it possible to create fair summaries for massive datasets? To this end, we develop the first streaming approximation algorithms for submodular maximization under fairness constraints, for both monotone and non-monotone functions. We validate our findings empirically on exemplar-based clustering, movie recommendation, DPP-based summarization, and maximum coverage in social networks, showing that fairness constraints do not significantly impact utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题