可扩展的细粒平行周期枚举算法

论文标题

可扩展的细粒平行周期枚举算法

Scalable Fine-Grained Parallel Cycle Enumeration Algorithms

论文作者

Blanuša, Jovan, Ienne, Paolo, Atasu, Kubilay

论文摘要

列举简单周期在计算生物学，网络科学和金融犯罪分析中具有重要的应用。在这项工作中，我们着重于约翰逊和雷德·塔扬（Johnson）的最新简单循环枚举算法以及它们对时间表的应用以及它们的应用。据我们所知，我们是第一个以细粒度的方式平行这两种算法的人。我们也是第一个在实验上展示线性性能缩放的人。通过我们将长顺序搜索分解为细粒度的任务，使这种缩放成为可能，然后将其动态地安排在CPU内核上，从而实现了最佳的负载平衡。此外，我们表明，约翰逊的粗粒平行版和读取的tarjan算法，这些算法利用边缘或顶点级并行性是不可扩展的。在四个多核CPU的集群中，物理核心$ 256，我们的细粒平行算法平均比其粗粒平行的算法快。当我们使用更多的CPU核心时，细粒和粗粒平行算法之间的性能差距扩大。当使用所有256个CPU内核时，我们的平行算法平均列举了时间周期，平均$ 260 \ tims $ $ $ $比Kumar和Calders的串行算法快。

Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algorithms by Johnson and Read-Tarjan along with their applications to temporal graphs. To our knowledge, we are the first ones to parallelise these two algorithms in a fine-grained manner. We are also the first to demonstrate experimentally a linear performance scaling. Such a scaling is made possible by our decomposition of long sequential searches into fine-grained tasks, which are then dynamically scheduled across CPU cores, enabling an optimal load balancing. Furthermore, we show that coarse-grained parallel versions of the Johnson and the Read-Tarjan algorithms that exploit edge- or vertex-level parallelism are not scalable. On a cluster of four multi-core CPUs with $256$ physical cores, our fine-grained parallel algorithms are, on average, an order of magnitude faster than their coarse-grained parallel counterparts. The performance gap between the fine-grained and the coarse-grained parallel algorithms widens as we use more CPU cores. When using all 256 CPU cores, our parallel algorithms enumerate temporal cycles, on average, $260\times$ faster than the serial algorithm of Kumar and Calders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题