论文标题
MGG:在多GPU平台上加速具有细粒内内核内通信策略的加速图神经网络
MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms
论文作者
论文摘要
图形神经网络(GNN)的输入图的增加大小突出了对使用多GPU平台的需求。但是,现有的多GPU GNN系统根据缩放密度DNN的常规实践来分别优化计算和通信。对于不规则稀疏且细粒度的GNN工作负载,此类解决方案错过了共同安排/优化高性能交付的计算和通信操作的机会。为此,我们提出了MGG,这是一种新型的系统设计,可以在多GPU平台上加速全部GNNS。 MGG的核心是其新型的动态软件管道,旨在促进GPU内核内细粒度的计算通信重叠。具体而言,MGG引入了GNN批准的管道构建和GPU感知管道映射,以促进工作负载平衡和操作重叠。 MGG还将智能运行时设计与分析建模和优化启发式方法结合在一起,以动态提高执行性能。广泛的评估表明,MGG在各种设置上的最先进的全图GNN系统的表现:平均4.41倍,4.81倍和10.83倍的速度分别比DGL,MGG-UVM和ROC快。
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively.