论文标题
扩展用于动态资源感知的自适应批处理计划的Slurm
Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling
论文作者
论文摘要
随着电力预算的限制日益增长,硬件故障率提高,未来Exascale系统的运行面临着几个挑战。为此,在HPC社区中积极研究了通过实现可延展工作来实现可延展工作的资源意识和适应性。可延展的作业可以在运行时改变其计算资源,并可以显着提高HPC系统性能。但是,由于流行的并行编程范式(例如MPI)的僵化性质以及在批处理系统中缺乏对动态资源管理的支持,因此可延展的工作在很大程度上未实现。在本文中,我们扩展了Slurm批处理系统,以支持可延展作业的执行和批处理计划。可延展的应用程序使用称为Invasive MPI的新自适应并行范式编写,该范式扩展了MPI标准以支持运行时的资源适应性。我们提出了两种可延展的工作调度策略,以支持运行时的性能感知和动态重新配置决策。我们在Slurm中实施策略,并在生产HPC系统上对其进行评估。与其他调度策略相比,我们的性能感知计划策略的结果显示了MakePAN,平均系统利用率,平均响应和等待时间的改进。此外,我们使用我们的功能感知策略来展示动态的电源走廊管理。
With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been actively researched in the HPC community. Malleable jobs can change their computing resources at runtime and can significantly improve HPC system performance. However, due to the rigid nature of popular parallel programming paradigms such as MPI and lack of support for dynamic resource management in batch systems, malleable jobs have been largely unrealized. In this paper, we extend the SLURM batch system to support the execution and batch scheduling of malleable jobs. The malleable applications are written using a new adaptive parallel paradigm called Invasive MPI which extends the MPI standard to support resource-adaptivity at runtime. We propose two malleable job scheduling strategies to support performance-aware and power-aware dynamic reconfiguration decisions at runtime. We implement the strategies in SLURM and evaluate them on a production HPC system. Results for our performance-aware scheduling strategy show improvements in makespan, average system utilization, average response, and waiting times as compared to other scheduling strategies. Moreover, we demonstrate dynamic power corridor management using our power-aware strategy.