比较：用于自动OpenMP S2S并行的优化多币仪

论文标题

比较：用于自动OpenMP S2S并行的优化多币仪

ComPar: Optimized Multi-Compiler for Automatic OpenMP S2S Parallelization

论文作者

Mosseri, Idan, Alon, Lee-or, Harel, Re'em, Oren, Gal

论文摘要

为了利用多核体系结构的全部好处，并行化方案至关重要。在上述体系结构中，最全面的并行化API是OpenMP。但是，由于常见的并行管理陷阱，架构异质性以及人类专业知识的当前必要性，以了解许多细节和抽象的相关性，因此将正确和最佳的OpenMP并行引入应用程序并不总是一项简单的任务。为了简化此过程，在过去十年中创建了许多自动并行化编译器。 Harel等。 [2020]测试了几个来源的编译器，并得出结论，每个编译器都有其优点和缺点，并且在所有测试中，没有编译器都优于所有其他编译器。这表明编译器在当前硬件设置的最佳超参数下的最佳输出融合可以产生更大的加速。要创建这样的融合，应该执行计算密集的高参数扫描，其中估算每个选项的性能，并选择最佳选项。我们创建了一个新颖的并行化源代码源多币仪，名称为“比较”，该编码分割和融合使用超参数调整，以在没有任何人为干预的情况下实现最佳的并行代码，同时保持程序的有效性。在本文中，我们对NAS和多个基准测试基准进行了比较并分析其结果。我们得出的结论是，尽管资源比较需要生成并行代码大于其他源代码并行化编译器，因为这取决于用户希望考虑的参数及其组合的参数数量，但比较与串行代码版本和其他测试的并行化编译器相比，比较总体上的性能。比较可在以下网址公开获取：https：//github.com/scientific-computing-lab-nrcn/compar。

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures. In said architectures, the most comprehensive parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel management pitfalls, architecture heterogeneity and the current necessity for human expertise in order to comprehend many fine details and abstract correlations. To ease this process, many automatic parallelization compilers were created over the last decade. Harel et al. [2020] tested several source-to-source compilers and concluded that each has its advantages and disadvantages and no compiler is superior to all other compilers in all tests. This indicates that a fusion of the compilers' best outputs under the best hyper-parameters for the current hardware setups can yield greater speedups. To create such a fusion, one should execute a computationally intensive hyper-parameter sweep, in which the performance of each option is estimated and the best option is chosen. We created a novel parallelization source-to-source multi-compiler named ComPar, which uses code segmentation-and-fusion with hyper-parameters tuning to achieve the best parallel code possible without any human intervention while maintaining the program's validity. In this paper we present ComPar and analyze its results on NAS and PolyBench benchmarks. We conclude that although the resources ComPar requires to produce parallel code are greater than other source-to-source parallelization compilers - as it depends on the number of parameters the user wishes to consider, and their combinations - ComPar achieves superior performance overall compared to the serial code version and other tested parallelization compilers. ComPar is publicly available at: https://github.com/Scientific-Computing-Lab-NRCN/compar.

下载PDF全文

下载文献需遵守相关版权规定

论文标题