论文标题
HPC系统上的Python工作流程
Python Workflows on HPC Systems
论文作者
论文摘要
计算密集型机器学习和数据分析方法的最新成功和广泛的应用程序已经促进了HPC系统上Python编程语言的使用。虽然Python为用户提供了许多优势,但它并未专注于多用户环境或并行编程,这使得在HPC系统上保持稳定且安全的Python工作流程非常具有挑战性。在本文中,我们分析了Python在HPC簇上使用的关键问题,并勾勒出适当的解决方法,以有效地维护多用户Python软件环境,确保和限制Python Jobs的资源并遏制Python流程,同时将重点放在GPU Clusters上的深入学习应用程序上。
The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems. While Python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming - making it quite challenging to maintain stable and secure Python workflows on a HPC system. In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds for efficiently maintaining multi-user Python software environments, securing and restricting resources of Python jobs and containing Python processes, while focusing on Deep Learning applications running on GPU clusters.