OpenMP群集编程模型

论文标题

OpenMP群集编程模型

The OpenMP Cluster Programming Model

论文作者

Yviquel, Hervé, Pereira, Marcio, Francesquini, Emílio, Valarini, Guilherme, Leite, Gustavo, Rosso, Pedro, Ceccato, Rodrigo, Cusihualpa, Carla, Dias, Vitoria, Rigo, Sandro, Souza, Alan, Araujo, Guido

论文摘要

尽管采取了各种研究计划和提议的编程模型，但HPC群集中并行编程的有效解决方案仍然依赖于不同编程模型（例如OpenMP和MPI），语言（例如C ++和CUDA）的复杂组合，以及专业的Runtimes（例如Charm ++和Megion）。另一方面，任务并行性已证明是群集的有效且无缝的编程模型。本文介绍了OpenMP群集（OPMC），这是一个任务并行模型，该模型扩展了用于群集编程的OpenMP。 OMPC利用OpenMP的卸载标准在分布式系统的节点上分发了注释的代码区域。为此，它将基于MPI的数据分布和负载平衡机制隐藏在OpenMP任务依赖性之后。鉴于其符合OpenMP，OMPC允许应用程序使用相同的编程模型来利用内部和节点的并行性，从而简化了开发过程和维护。我们使用任务基准进行了OMPC，这是一个侧重于任务并行性的合成基准测试，将其性能与其他分布式运行时间进行了比较。实验结果表明，与CCR和可伸缩性实验相比，OMPC的性能分别可以比Charm ++提高1.53倍和2.43倍。实验还表明，OMPC性能对于任务工作台和现实世界地震成像应用的尺度较弱。

Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programming model for clusters. This paper introduces OpenMP Cluster (OMPC), a task-parallel model that extends OpenMP for cluster programming. OMPC leverages OpenMP's offloading standard to distribute annotated regions of code across the nodes of a distributed system. To achieve that it hides MPI-based data distribution and load-balancing mechanisms behind OpenMP task dependencies. Given its compliance with OpenMP, OMPC allows applications to use the same programming model to exploit intra- and inter-node parallelism, thus simplifying the development process and maintenance. We evaluated OMPC using Task Bench, a synthetic benchmark focused on task parallelism, comparing its performance against other distributed runtimes. Experimental results show that OMPC can deliver up to 1.53x and 2.43x better performance than Charm++ on CCR and scalability experiments, respectively. Experiments also show that OMPC performance weakly scales for both Task Bench and a real-world seismic imaging application.

下载PDF全文

下载文献需遵守相关版权规定

论文标题