如何在大规模数据模型中解决公平的$ k $中心

论文标题

如何在大规模数据模型中解决公平的$ k $中心

How to Solve Fair $k$-Center in Massive Data Models

论文作者

Chiplunkar, Ashish, Kale, Sagar, Ramamoorthy, Sivaramakrishnan Natarajan

论文摘要

在大量数据的推动下，重要的决策是在算法的帮助下自动化的，因此，算法中的公平性已成为一个特别重要的研究主题。在这项工作中，我们为公平的$ k $中心问题设计了新的流和分布式算法，该算法对公平数据摘要进行了建模。计算的流和分布式模型具有吸引人的功能，即能够处理不适合主内存的大量数据集。我们的主要贡献是：（a）第一个分布式算法；事实证明，它具有恒定的近似值，并且非常可行，并且（b）具有可证明的近似值的两通道流媒体算法匹配最著名的算法（这不是流算法）。我们的算法具有在实践中易于实现的优点，在线性运行时间中快速实现，工作记忆和通信非常小，并且在几个真实和综合数据集上的现有算法胜过了现有算法。为了补充我们的分布式算法，我们还为自然分布式算法提供了硬度结果，即使是$ k $ center的特殊情况，它也具有。

Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair $k$-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of $k$-center.

下载PDF全文

下载文献需遵守相关版权规定

论文标题