论文标题
OpenMP群集编程模型
The OpenMP Cluster Programming Model
论文作者
论文摘要
尽管采取了各种研究计划和提议的编程模型,但HPC群集中并行编程的有效解决方案仍然依赖于不同编程模型(例如OpenMP和MPI),语言(例如C ++和CUDA)的复杂组合,以及专业的Runtimes(例如Charm ++和Megion)。另一方面,任务并行性已证明是群集的有效且无缝的编程模型。本文介绍了OpenMP群集(OPMC),这是一个任务并行模型,该模型扩展了用于群集编程的OpenMP。 OMPC利用OpenMP的卸载标准在分布式系统的节点上分发了注释的代码区域。为此,它将基于MPI的数据分布和负载平衡机制隐藏在OpenMP任务依赖性之后。鉴于其符合OpenMP,OMPC允许应用程序使用相同的编程模型来利用内部和节点的并行性,从而简化了开发过程和维护。我们使用任务基准进行了OMPC,这是一个侧重于任务并行性的合成基准测试,将其性能与其他分布式运行时间进行了比较。实验结果表明,与CCR和可伸缩性实验相比,OMPC的性能分别可以比Charm ++提高1.53倍和2.43倍。实验还表明,OMPC性能对于任务工作台和现实世界地震成像应用的尺度较弱。
Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programming model for clusters. This paper introduces OpenMP Cluster (OMPC), a task-parallel model that extends OpenMP for cluster programming. OMPC leverages OpenMP's offloading standard to distribute annotated regions of code across the nodes of a distributed system. To achieve that it hides MPI-based data distribution and load-balancing mechanisms behind OpenMP task dependencies. Given its compliance with OpenMP, OMPC allows applications to use the same programming model to exploit intra- and inter-node parallelism, thus simplifying the development process and maintenance. We evaluated OMPC using Task Bench, a synthetic benchmark focused on task parallelism, comparing its performance against other distributed runtimes. Experimental results show that OMPC can deliver up to 1.53x and 2.43x better performance than Charm++ on CCR and scalability experiments, respectively. Experiments also show that OMPC performance weakly scales for both Task Bench and a real-world seismic imaging application.