论文标题
OpenMP循环转换和运行时构建的能源效率评估
Energy-Efficiency Evaluation of OpenMP Loop Transformations and Runtime Constructs
论文作者
论文摘要
OpenMP是用于HPC应用程序中并行编程的事实上的API。这些程序通常是在能源消耗是主要问题的数据中心中计算的。尽管以前的工作几乎完全集中在绩效上,但我们在这里从能耗的角度分析了OpenMP的各个方面。该分析是通过在数据中心节点上执行新颖的微型计算和常见基准套件来完成的,并测量能耗。分析了三个主要方面:指令生成的循环瓷砖和展开,并行处理循环和明确的任务,以及处理阻塞线程的策略。对于循环平铺和展开,我们发现瓷砖可以为某些(大多数是不可食用的程序)节省大量能源,而指令生成的展开为最佳情况提供了很小的改进,并且在最坏情况下大部分的性能大多会脱颖而出。在第二方面,我们发现在可以使用两者的情况下,循环的平行比显式任务循环产生更好的结果。通过更细粒度的工作量,这变得更加突出。对于第三章,我们发现可以通过不排除等待线索来节省大量能源,而是以较高的功耗为代价,而是让它们旋转。我们还分析了编译器的选择如何通过使用ICC,Clang和GCC编译程序来影响上述问题,并发现虽然两者都不比其他问题更好,但它们可以为同一编译程序产生非常不同的结果。作为最后一步,我们结合了所有结果的发现,并提出了新颖的编译器指令,以及有关如何减少OpenMP计划中能耗的一般建议。
OpenMP is the de facto API for parallel programming in HPC applications. These programs are often computed in data centers, where energy consumption is a major issue. Whereas previous work has focused almost entirely on performance, we here analyse aspects of OpenMP from an energy consumption perspective. This analysis is accomplished by executing novel microbenchmarks and common benchmark suites on data center nodes and measuring the energy consumption. Three main aspects are analysed: directive-generated loop tiling and unrolling, parallel for loops and explicit tasking, and the policy of handling blocked threads. For loop tiling and unrolling, we find that tiling can yield significant energy savings for some, mostly unoptimised programs, while directive-generated unrolling provides very minor improvement in the best case and degenerates performance majorly in the worst case. For the second aspect, we find that parallel for loops yield better results than explicit tasking loops in cases where both can be used. This becomes more prominent with more fine-grained workloads. For the third, we find that significant energy savings can be made by not descheduling waiting threads, but instead having them spin, at the cost of a higher power consumption. We also analyse how the choice of compiler affects the above questions by compiling programs with each of ICC, Clang and GCC, and find that while neither is strictly better than the others, they can produce very different results for the same compiled programs. As a final step, we combine the findings of all results and suggest novel compiler directives as well as general recommendations on how to reduce energy consumption in OpenMP programs.