论文标题

通过最大化和最小化互信息深度公平聚类:理论,算法和度量标准

Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric

论文作者

Zeng, Pengxin, Li, Yunfan, Hu, Peng, Peng, Dezhong, Lv, Jiancheng, Peng, Xi

论文摘要

公平的聚类旨在将数据分为不同的簇,同时防止敏感属性(\ textit {e.g。},性别,种族,RNA测序技术),而不是主导聚类。尽管最近已经进行了许多作品并取得了巨大的成功,但其中大多数都是启发式的,并且缺乏算法设计的统一理论。在这项工作中,我们通过开发一种相互信息理论来填补这一空白,以进行深度公平的聚类,并因此设计出一种称为FCMI的新型算法。简而言之,通过最大化和最小化互信息,FCMI旨在通过深度公平的聚类,\ textit {i.e。},紧凑,平衡和公平的群集以及信息丰富的特征来实现四种特征。除了对理论和算法的贡献外,这项工作的另一项贡献是提出了一个基于信息理论的新颖的公平聚类指标。与现有的评估指标不同,我们的指标可以衡量整体而不是单独的方式的聚类质量和公平性。为了验证拟议的FCMI的有效性,我们对六个基准进行了实验,包括单细胞RNA-seq地图集,而与11种最先进的方法相比,就五个指标而言。可以从\ url {https://pengxi.me}访问代码。

Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by developing a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. In brief, through maximizing and minimizing mutual information, FCMI is designed to achieve four characteristics highly expected by deep fair clustering, \textit{i.e.}, compact, balanced, and fair clusters, as well as informative features. Besides the contributions to theory and algorithm, another contribution of this work is proposing a novel fair clustering metric built upon information theory as well. Unlike existing evaluation metrics, our metric measures the clustering quality and fairness as a whole instead of separate manner. To verify the effectiveness of the proposed FCMI, we conduct experiments on six benchmarks including a single-cell RNA-seq atlas compared with 11 state-of-the-art methods in terms of five metrics. The code could be accessed from \url{ https://pengxi.me}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源