论文标题

RESHI:推荐资源来进行科学工作流程任务

Reshi: Recommending Resources for Scientific Workflow Tasks on Heterogeneous Infrastructures

论文作者

Bader, Jonathan, Lehmann, Fabian, Groth, Alexander, Thamsen, Lauritz, Scheinert, Dominik, Will, Jonathan, Leser, Ulf, Kao, Odej

论文摘要

科学工作流程通常包含许多不同的处理步骤,这些步骤通常在输入数据的不同分区中并行执行。反过来,这些执行必须安排在手头计算基础架构的计算节点上。 (a)任务通常具有高度异质的资源需求以及(b)在许多基础架构中,计算节点提供高度异构资源的事实使这项任务变得复杂。因此,根据许多调度算法的要求,对给定节点上给定任务的运行时的预测通常不精确,这可能会导致次优计划的决策。 我们提出了RESHI,这是一种在工作流执行过程中推荐任务节点分配的方法,可以应对异质任务和异质节点。 Reshi将问题作为回归任务处理,在该任务中,任务节对在专用微型基准和过去任务执行的结果上被建模为功能向量。基于这些功能,RESHI训练一个回归树模型,以对每个现成的任务进行排名和推荐节点,可以用作调度程序的输入。为了进行评估,我们使用三个代表性工作流对27个AWS机器类型进行了测试。我们将Reshi的建议与三个最先进的调度程序进行比较。我们的评估表明,假设平均的任务运行时预测误差为15%,RESHI的表现优于平均减少7.18%和18.01%的人。

Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in many infrastructures, compute nodes offer highly heterogeneous resources. In consequence, predictions of the runtime of a given task on a given node, as required by many scheduling algorithms, are often rather imprecise, which can lead to sub-optimal scheduling decisions. We propose Reshi, a method for recommending task-node assignments during workflow execution that can cope with heterogeneous tasks and heterogeneous nodes. Reshi approaches the problem as a regression task, where task-node pairs are modeled as feature vectors over the results of dedicated micro benchmarks and past task executions. Based on these features, Reshi trains a regression tree model to rank and recommend nodes for each ready-to-run task, which can be used as input to a scheduler. For our evaluation, we benchmarked 27 AWS machine types using three representative workflows. We compare Reshi's recommendations with three state-of-the-art schedulers. Our evaluation shows that Reshi outperforms HEFT by a mean makespan reduction of 7.18% and 18.01% assuming a mean task runtime prediction error of 15%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源