论文标题

设计用于在科学工作流管理系统中推荐中间状态

Designing for Recommending Intermediate States in A Scientific Workflow Management System

论文作者

Chakroborti, Debasish, Roy, Banani, Nath, Sristy Sumana

论文摘要

为了按顺序和系统地处理大量数据,对科学工作流管理系统(SWFMS)中工作流程组件的适当管理(即模块,数据,配置,端口和链接之间的关联)是不可避免的。用SWFM中的出处管理数据以支持工作流,模块和数据的可重复使用性并不是一个简单的任务。对于经常组装和执行的复杂工作流程,用于使用不同技术(即各种学习算法或模型)调查大型数据集的经常组装和执行的复杂工作流程更加繁重。但是,许多研究提出了用于管理和推荐SWFMS服务的各种技术,但是只有很少的研究考虑了SWFMS中数据的管理,以有效地存储和促进工作流执行。此外,从用户的角度来看,没有研究可以询问SWFMS中此类数据管理的有效性和效率。在本文中,我们介绍并评估了与两种用例(植物表型和生物信息学)的中间数据管理方法的GUI版本。我们称为GUI-RISPTS的技术(从考虑工具群体的管道中推荐中间状态)可以促进使用处理的数据(即工作流中模块的中间结果)的工作流执行,因此可以减少SWFMS中某些模块的计算时间。我们将GUI-RISPTS与现有的工作流管理系统集成在一起,称为Sciworcs。在Sciworcs中,我们提出了一个接口,用户使用该接口来选择中间状态的建议(即模块的结果)。我们从用户的角度研究了GUI-RISP的有效性,同时衡量了其在工作流执行中的存储和效率方面的开销。

To process a large amount of data sequentially and systematically, proper management of workflow components (i.e., modules, data, configurations, associations among ports and links) in a Scientific Workflow Management System (SWfMS) is inevitable. Managing data with provenance in a SWfMS to support reusability of workflows, modules, and data is not a simple task. Handling such components is even more burdensome for frequently assembled and executed complex workflows for investigating large datasets with different technologies (i.e., various learning algorithms or models). However, a great many studies propose various techniques and technologies for managing and recommending services in a SWfMS, but only a very few studies consider the management of data in a SWfMS for efficient storing and facilitating workflow executions. Furthermore, there is no study to inquire about the effectiveness and efficiency of such data management in a SWfMS from a user perspective. In this paper, we present and evaluate a GUI version of such a novel approach of intermediate data management with two use cases (Plant Phenotyping and Bioinformatics). The technique we call GUI-RISPTS (Recommending Intermediate States from Pipelines Considering Tool-States) can facilitate executions of workflows with processed data (i.e., intermediate outcomes of modules in a workflow) and can thus reduce the computational time of some modules in a SWfMS. We integrated GUI-RISPTS with an existing workflow management system called SciWorCS. In SciWorCS, we present an interface that users use for selecting the recommendation of intermediate states (i.e., modules' outcomes). We investigated GUI-RISP's effectiveness from users' perspectives along with measuring its overhead in terms of storage and efficiency in workflow execution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源