论文标题

超图和两部分图中的参数化相关聚类

Parameterized Correlation Clustering in Hypergraphs and Bipartite Graphs

论文作者

Veldt, Nate, Wirth, Anthony, Gleich, David F.

论文摘要

通过在社区检测和密集的子图发现中应用的动机,我们考虑了超图和两分图中的新聚类目标。这些目标是通过一个或多个分辨率参数参数化的,以便在复杂数据中启用各种知识发现。 对于超图和两分目标,我们确定了等效于现有目标并共享其(多项式时间)近似算法的参数制度。我们首先表明我们的参数化超图相关聚类目标与超绘图中的归一化切割和模块化的高阶概念有关。它可以通过超边膨胀技术进一步适合近似算法。 我们的参数化两部分相关聚类目标概括了标准未加权的两分相关群集以及双分解删除。对于一定的参数选择,它也与我们的HyperGraph目标有关。尽管它通常是NP- hard,但我们重点介绍了两部分物镜的参数制度,其中该问题减少到了双分部分匹配问题,因此可以在多项式时间内解决。对于其他参数设置,我们使用线性程序四舍五入技术提出近似算法。这些结果使我们能够引入第一个恒定因子近似以进行双升删除,即去除最小数量的边缘以将双分部分图划分为不相关的双关联bi-cliques的任务。 在一些实验结果中,我们强调了框架的灵活性以及在不同参数设置中可以获得的结果的多样性。这包括在一系列参数上群集的两分图,在电子邮件网络和食品网中检测富含主题的集群,以及在产品审查HyperGraph中形成零售产品集群,这些零售产品与已知产品类别高度相关。

Motivated by applications in community detection and dense subgraph discovery, we consider new clustering objectives in hypergraphs and bipartite graphs. These objectives are parameterized by one or more resolution parameters in order to enable diverse knowledge discovery in complex data. For both hypergraph and bipartite objectives, we identify parameter regimes that are equivalent to existing objectives and share their (polynomial-time) approximation algorithms. We first show that our parameterized hypergraph correlation clustering objective is related to higher-order notions of normalized cut and modularity in hypergraphs. It is further amenable to approximation algorithms via hyperedge expansion techniques. Our parameterized bipartite correlation clustering objective generalizes standard unweighted bipartite correlation clustering, as well as bicluster deletion. For a certain choice of parameters it is also related to our hypergraph objective. Although in general it is NP-hard, we highlight a parameter regime for the bipartite objective where the problem reduces to the bipartite matching problem and thus can be solved in polynomial time. For other parameter settings, we present approximation algorithms using linear program rounding techniques. These results allow us to introduce the first constant-factor approximation for bicluster deletion, the task of removing a minimum number of edges to partition a bipartite graph into disjoint bi-cliques. In several experimental results, we highlight the flexibility of our framework and the diversity of results that can be obtained in different parameter settings. This includes clustering bipartite graphs across a range of parameters, detecting motif-rich clusters in an email network and a food web, and forming clusters of retail products in a product review hypergraph, that are highly correlated with known product categories.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源