论文标题

进化聚类可以具有理论保证吗?

Can Evolutionary Clustering Have Theoretical Guarantees?

论文作者

Qian, Chao

论文摘要

聚类是许多领域的基本问题,旨在根据某个距离度量将给定数据设置为组,因此同一组中的数据点相似,而不同组中的数据点不同。由于其重要性和NP硬度,已经提出了许多方法,其中进化算法是一类流行的方法。进化聚类发现了许多成功的应用,但是所有结果都是经验的,缺乏理论支持。本文通过证明GSEMO(一种简单的多目标进化算法的近似性能)来填补这一空白,以求解四种聚类的四种表述,即$ k $ -tmm,$ k $ - $ $ - $ $ k $ - $ k $ -median和$ k $ -means,可以保证,可以保证。此外,我们考虑在公平性下进行聚类,该聚类试图避免算法偏见,并且最近是机器学习中的重要研究主题。我们证明,对于在个人公平性下的离散$ k $ -Median聚类,GSEMO的近似性能在理论上可以保证相对于目标函数和公平性约束。

Clustering is a fundamental problem in many areas, which aims to partition a given data set into groups based on some distance measure, such that the data points in the same group are similar while that in different groups are dissimilar. Due to its importance and NP-hardness, a lot of methods have been proposed, among which evolutionary algorithms are a class of popular ones. Evolutionary clustering has found many successful applications, but all the results are empirical, lacking theoretical support. This paper fills this gap by proving that the approximation performance of the GSEMO (a simple multi-objective evolutionary algorithm) for solving four formulations of clustering, i.e., $k$-tMM, $k$-center, discrete $k$-median and $k$-means, can be theoretically guaranteed. Furthermore, we consider clustering under fairness, which tries to avoid algorithmic bias, and has recently been an important research topic in machine learning. We prove that for discrete $k$-median clustering under individual fairness, the approximation performance of the GSEMO can be theoretically guaranteed with respect to both the objective function and the fairness constraint.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源