论文标题
用于分布式异质计算的Python程序的自动并行化
Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
论文作者
论文摘要
本文介绍了一种新颖的方法,用于提前(AOT)并行化自动化和优化顺序Python程序,以在分布式异质平台上执行。我们的方法启用了Python程序的AOT源对源转换,这是由功能参数和返回值的类型提示驱动的。这些提示可以由程序员提供,也可以由动态剖面工具获得;在所有情况下,多元代码生成都可以保证我们的AOT转换的正确性。 我们的汇编框架对目标分布式异质硬件平台进行自动并行和复杂的高级代码优化。它包括统一用户写的循环和矩阵/张量运算符中的隐式循环的多面体框架的扩展,以及CPU与GPU代码变体的自动部分。此外,我们的多面体优化可以使节点内和节点间并行性。最后,使用射线运行时部署了优化的输出代码,以在集群中的多个异构节点上调度分布式任务。 我们的经验评估表明,在单节点和多节点实验中,相对于顺序Python的性能改善,在使用24个节点和OLCF Summit SuperCroupter中的144个GPU时,在单个节点和多节点实验中的性能提高了20,000美元以上的性能,用于时空适应性处理(Stap)Radar应用程序。
This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT source-to-source transformation of Python programs, driven by the inclusion of type hints for function parameters and return values. These hints can be supplied by the programmer or obtained by dynamic profiler tools; multi-version code generation guarantees the correctness of our AOT transformation in all cases. Our compilation framework performs automatic parallelization and sophisticated high-level code optimizations for the target distributed heterogeneous hardware platform. It includes extensions to the polyhedral framework that unify user-written loops and implicit loops present in matrix/tensor operators, as well as automated section of CPU vs. GPU code variants. Further, our polyhedral optimizations enable both intra-node and inter-node parallelism. Finally, the optimized output code is deployed using the Ray runtime for scheduling distributed tasks across multiple heterogeneous nodes in a cluster. Our empirical evaluation shows significant performance improvements relative to sequential Python in both single-node and multi-node experiments, with a performance improvement of over 20,000$\times$ when using 24 nodes and 144 GPUs in the OLCF Summit supercomputer for the Space-Time Adaptive Processing (STAP) radar application.