探索GPU上静态和增量图连接算法的设计空间

论文标题

探索GPU上静态和增量图连接算法的设计空间

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

论文作者

Hong, Changwan, Dhulipala, Laxman, Shun, Julian

论文摘要

连接的组件和跨越森林是基本的图算法，因为它们在许多重要的应用中（例如图形聚类和图像分割）中使用。 GPU是图形算法的理想平台，因为它们的峰值性能高和记忆带宽。尽管文献中存在几种GPU连接算法，但尚未探索许多设计选择。在本文中，我们探讨了GPU连接算法中的各种设计选择，包括用于静态和增量设置的采样，链接和树压缩。我们的各种设计选择导致300多个新的GPU连接实现，其中许多选择都超过了最先进的连接。我们提出了实验评估，并表明我们在现有静态算法上达到了2.47倍加速的平均加速。在增量设置中，我们达到每秒高达482.3亿个边缘的吞吐量。与72核机上的最先进的CPU实现相比，我们使用TESLA V100 GPU实现了8.26--14.51x的速度为8.26--14.51x，用于增量连接的1.85--13.36x。

Connected components and spanning forest are fundamental graph algorithms due to their use in many important applications, such as graph clustering and image segmentation. GPUs are an ideal platform for graph algorithms due to their high peak performance and memory bandwidth. While there exist several GPU connectivity algorithms in the literature, many design choices have not yet been explored. In this paper, we explore various design choices in GPU connectivity algorithms, including sampling, linking, and tree compression, for both the static as well as the incremental setting. Our various design choices lead to over 300 new GPU implementations of connectivity, many of which outperform state-of-the-art. We present an experimental evaluation, and show that we achieve an average speedup of 2.47x speedup over existing static algorithms. In the incremental setting, we achieve a throughput of up to 48.23 billion edges per second. Compared to state-of-the-art CPU implementations on a 72-core machine, we achieve a speedup of 8.26--14.51x for static connectivity and 1.85--13.36x for incremental connectivity using a Tesla V100 GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题