评估GPU上的抽象异步Schwarz求解器

论文标题

评估GPU上的抽象异步Schwarz求解器

Evaluating Abstract Asynchronous Schwarz solvers on GPUs

论文作者

Nayak, Pratik, Cojean, Terry, Anzt, Hartwig

论文摘要

随着Exascale计算时代的开始，我们意识到，即使在每个节点上的多个协同处理器（例如GPU）和每个节点上的多个核心上，大多数领导力超级计算机也是异质和大规模平行的。例如，ORNLS峰会在每个节点上积累了六个Nvidia tesla V100和42个核心IBM Power9。在单个节点中甚至多个节点中的所有这些计算资源之间同步非常昂贵。因此，有必要开发和研究异步算法，以规避大量平行性的批量同步计算问题。在这项研究中，我们研究了抽象限制的加性Schwarz方法的异步版本，作为求解器，我们不明确同步，而是允许子域之间的数据传达完全异步，从而消除了算法的散装同步性质。我们通过使用MPI标准的单个RMA函数来实现这一目标。我们研究了在多核架构和多个GPU上使用这种异步求解器的好处。我们还研究了通信模式和本地求解器及其对全球求解器的影响。最后，我们表明，这个概念可以使同步同步的有吸引力的运行时受益。

With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel even on a single node with multiple co-processors such as GPUs and multiple cores on each node. For example, ORNLs Summit accumulates six NVIDIA Tesla V100s and 42 core IBM Power9s on each node. Synchronizing across all these compute resources in a single node or even across multiple nodes is prohibitively expensive. Hence it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing for massive parallelism. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver where we do not explicitly synchronize, but allow for communication of the data between the sub-domains to be completely asynchronous thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the onesided RMA functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart on both multi-core architectures and on multiple GPUs. We also study the communication patterns and local solvers and their effect on the global solver. Finally, we show that this concept can render attractive runtime benefits over the synchronous counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题