论文标题

大型有向图上的分布式D核分解

Distributed D-core Decomposition over Large Directed Graphs

论文作者

Liao, Xuankun, Liu, Qing, Jiang, Jiaxin, Huang, Xin, Xu, Jianliang, Choi, Byron

论文摘要

鉴于有向图的$ g $和整数$ k $和$ l $,D核是最大子图$ H \ subseteq g $,因此对于$ h $的每个顶点,其内度和超级不小于$ k $和$ l $。对于有向图的$ g $,D核分解的问题旨在计算所有可能的$ K $和$ L $的非空D核。在文献中,已经提出了几种\ emph {基于剥离的}算法来处理D核分解。但是,基于剥离的算法以顺序的方式工作并需要在处理过程中需要全局图信息,主要是为\ emph {Collemistion}设置设计的,该设置无法在分布式设置中有效地处理大规模图。由此激励,我们研究了本文\ emph {分布式} d核分解问题。我们首先定义一个称为\ emph {锚定的coreness}的概念,我们基于该概念提出了一种基于H-Index的新算法,用于分布式D核分解。此外,我们设计了一个新颖的概念,即\ emph {Skyline Coreness},并表明D核分解问题等于所有顶点的天际线菌度的计算。我们设计了一个有效的D索引来计算分布的天际线。我们在以顶点为中心和以块为中心的分布式图处理框架下实现了所提出的算法。此外,我们理论上分析了算法和消息复杂性。在数十亿个边缘的大型现实图表上进行了广泛的实验,证明了拟议算法的效率,从运行时间和交流开销方面。

Given a directed graph $G$ and integers $k$ and $l$, a D-core is the maximal subgraph $H \subseteq G$ such that for every vertex of $H$, its in-degree and out-degree are no smaller than $k$ and $l$, respectively. For a directed graph $G$, the problem of D-core decomposition aims to compute the non-empty D-cores for all possible values of $k$ and $l$. In the literature, several \emph{peeling-based} algorithms have been proposed to handle D-core decomposition. However, the peeling-based algorithms that work in a sequential fashion and require global graph information during processing are mainly designed for \emph{centralized} settings, which cannot handle large-scale graphs efficiently in distributed settings. Motivated by this, we study the \emph{distributed} D-core decomposition problem in this paper. We start by defining a concept called \emph{anchored coreness}, based on which we propose a new H-index-based algorithm for distributed D-core decomposition. Furthermore, we devise a novel concept, namely \emph{skyline coreness}, and show that the D-core decomposition problem is equivalent to the computation of skyline corenesses for all vertices. We design an efficient D-index to compute the skyline corenesses distributedly. We implement the proposed algorithms under both vertex-centric and block-centric distributed graph processing frameworks. Moreover, we theoretically analyze the algorithm and message complexities. Extensive experiments on large real-world graphs with billions of edges demonstrate the efficiency of the proposed algorithms in terms of both the running time and communication overhead.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源