论文标题

扩展远距离的核心分解

Scaling Up Distance-generalized Core Decomposition

论文作者

Dai, Qiangqiang, Li, Rong-Hua, Qin, Lu, Wang, Guoren, Yang, Weihua, Zhang, Zhiwei, Yuan, Ye

论文摘要

核心分解是网络分析中的基本操作员。在本文中,我们研究了计算网络上远程核心分解的问题。远程普通的核心,也称为$(k,h)$ - 核心,是一个最大的子图,其中每个顶点在距离至少具有$ k $的其他顶点,距离不超过$ h $。解决此问题的最先进算法是基于剥离技术,该技术迭代地从具有最小的$ h $ degegree的图中迭代删除顶点(用$ v $表示)。顶点$ v $的$ h $ - 表示在$ h $ hops中可从$ v $达到的其他顶点的数量。但是,这种剥皮算法在删除$ v $之后,需要经常重新计算$ v $的邻居的$ h $数度,这对于大$ h $通常是非常昂贵的。为了克服这一限制,我们提出了一种基于新颖的$ h $ -Begree更新技术的有效剥离算法。我们的算法无需重新计算$ h $级,而是可以通过探索一个非常小的子图(在剥落顶点后),动态地维护所有顶点的$ h $ degrees。我们表明,通过优雅的位图技术可以有效地实现这种$ h $数量的更新过程。此外,我们还提出了一种基于抽样的算法和平行化技术,以进一步提高效率。最后,我们对12个真实图表进行了广泛的实验,以评估我们的算法。结果表明,当$ h \ ge 3 $时,我们的确切和采样算法可以分别达到$ 10 \ times $和$ 100 \ times $ $加速,分别超过了最先进的算法。

Core decomposition is a fundamental operator in network analysis. In this paper, we study the problem of computing distance-generalized core decomposition on a network. A distance-generalized core, also termed $(k, h)$-core, is a maximal subgraph in which every vertex has at least $k$ other vertices at distance no larger than $h$. The state-of-the-art algorithm for solving this problem is based on a peeling technique which iteratively removes the vertex (denoted by $v$) from the graph that has the smallest $h$-degree. The $h$-degree of a vertex $v$ denotes the number of other vertices that are reachable from $v$ within $h$ hops. Such a peeling algorithm, however, needs to frequently recompute the $h$-degrees of $v$'s neighbors after deleting $v$, which is typically very costly for a large $h$. To overcome this limitation, we propose an efficient peeling algorithm based on a novel $h$-degree updating technique. Instead of recomputing the $h$-degrees, our algorithm can dynamically maintain the $h$-degrees for all vertices via exploring a very small subgraph, after peeling a vertex. We show that such an $h$-degree updating procedure can be efficiently implemented by an elegant bitmap technique. In addition, we also propose a sampling-based algorithm and a parallelization technique to further improve the efficiency. Finally, we conduct extensive experiments on 12 real-world graphs to evaluate our algorithms. The results show that, when $h\ge 3$, our exact and sampling-based algorithms can achieve up to $10\times$ and $100\times$ speedup over the state-of-the-art algorithm, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源