论文标题
NOS-V:使用系统范围的任务计划共同执行HPC应用程序
nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling
论文作者
论文摘要
未来的Exascale系统将具有庞大的并行性,多核处理器和异质体系结构。在这种情况下,HPC应用程序越来越难以充分有效地利用系统节点中的资源。此外,增加的并行性加剧了当前应用中现有效率低下的影响。研究表明,共同安排应用程序共享系统节点而不是专门执行每个应用程序可以提高资源利用率和效率。然而,当前的分享节点的过度订购和共处技术具有多个缺点,这些缺点限制了它们的适用性并使它们非常依赖于应用程序。 本文通过系统范围的调度介绍了共同执行。联合执行是一种新型的细粒技术,可以同时在同一节点上同时执行多个HPC应用程序,从而超过了当前的最新方法。我们在NOS-V中实现此技术,这是一个轻巧的任务库,通过系统范围的任务调度支持共执行。此外,NOS-V可以轻松地与现有的编程模型集成,不需要更改用户应用程序。我们展示了与NOS-V的共同执行如何显着减少单个节点和分布式环境上的多个应用程序的时间表制作,从而优于先前的节点共享技术。
Future Exascale systems will feature massive parallelism, many-core processors and heterogeneous architectures. In this scenario, it is increasingly difficult for HPC applications to fully and efficiently utilize the resources in system nodes. Moreover, the increased parallelism exacerbates the effects of existing inefficiencies in current applications. Research has shown that co-scheduling applications to share system nodes instead of executing each application exclusively can increase resource utilization and efficiency. Nevertheless, the current oversubscription and co-location techniques to share nodes have several drawbacks which limit their applicability and make them very application-dependent. This paper presents co-execution through system-wide scheduling. Co-execution is a novel fine-grained technique to execute multiple HPC applications simultaneously on the same node, outperforming current state-of-the-art approaches. We implement this technique in nOS-V, a lightweight tasking library that supports co-execution through system-wide task scheduling. Moreover, nOS-V can be easily integrated with existing programming models, requiring no changes to user applications. We showcase how co-execution with nOS-V significantly reduces schedule makespan for several applications on single node and distributed environments, outperforming prior node-sharing techniques.