论文标题
可扩展的细粒平行周期枚举算法
Scalable Fine-Grained Parallel Cycle Enumeration Algorithms
论文作者
论文摘要
列举简单周期在计算生物学,网络科学和金融犯罪分析中具有重要的应用。在这项工作中,我们着重于约翰逊和雷德·塔扬(Johnson)的最新简单循环枚举算法以及它们对时间表的应用以及它们的应用。据我们所知,我们是第一个以细粒度的方式平行这两种算法的人。我们也是第一个在实验上展示线性性能缩放的人。通过我们将长顺序搜索分解为细粒度的任务,使这种缩放成为可能,然后将其动态地安排在CPU内核上,从而实现了最佳的负载平衡。此外,我们表明,约翰逊的粗粒平行版和读取的tarjan算法,这些算法利用边缘或顶点级并行性是不可扩展的。在四个多核CPU的集群中,物理核心$ 256,我们的细粒平行算法平均比其粗粒平行的算法快。当我们使用更多的CPU核心时,细粒和粗粒平行算法之间的性能差距扩大。当使用所有256个CPU内核时,我们的平行算法平均列举了时间周期,平均$ 260 \ tims $ $ $ $比Kumar和Calders的串行算法快。
Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algorithms by Johnson and Read-Tarjan along with their applications to temporal graphs. To our knowledge, we are the first ones to parallelise these two algorithms in a fine-grained manner. We are also the first to demonstrate experimentally a linear performance scaling. Such a scaling is made possible by our decomposition of long sequential searches into fine-grained tasks, which are then dynamically scheduled across CPU cores, enabling an optimal load balancing. Furthermore, we show that coarse-grained parallel versions of the Johnson and the Read-Tarjan algorithms that exploit edge- or vertex-level parallelism are not scalable. On a cluster of four multi-core CPUs with $256$ physical cores, our fine-grained parallel algorithms are, on average, an order of magnitude faster than their coarse-grained parallel counterparts. The performance gap between the fine-grained and the coarse-grained parallel algorithms widens as we use more CPU cores. When using all 256 CPU cores, our parallel algorithms enumerate temporal cycles, on average, $260\times$ faster than the serial algorithm of Kumar and Calders.