论文标题

在异质云环境中的数据管道的全局优化

Global Optimization of Data Pipelines in Heterogeneous Cloud Environments

论文作者

Lin, Erica, Xu, Luna, Bramhavar, Suraj, de Oca, Marco Montes, Gorsky, Sean, Yi, Lingyun, Groetsema, Arianna, Chou, Jeffrey

论文摘要

对于许多基于云的公司而言,现代生产数据处理和机器学习管道是关键的组件。这些管道通常由由定向的无环图(DAG)表示的复杂工作流程组成。云环境对这些工作流程具有吸引力,因为各种各样的实例和价格都可以为不同的成本绩效需求提供灵活性。但是,这种灵活性还导致为DAG中的每个任务选择正确的资源配置(例如实例类型,资源需求)的复杂性,同时使用所选资源安排任务以达到最佳的端到端性能和成本。这两个决定通常是相互依存的,导致NP安全的优化瓶颈。现有解决方案仅专注于任何问题,而忽略端到端最佳的共同效应。我们建议Agora,Agora是一项调度程序,它考虑了在异质云环境中整体DAG工作流的任务级资源分配和执行。 Agora First(1)从先前运行中研究任务的特征,并对资源配置进行预测,(2)自动找到最佳的配置,并具有其整个工作流程的相应时间表,并具有成本绩效目标。我们在异类的亚马逊网络服务(AWS)云环境中评估了Agora,与最先进的调度程序相比,气流提供了多租户工作流程,并证明了高达45%的性能提高,成本降低了77%。此外,我们将Agora应用于阿里巴巴的实际生产痕迹,并显示成本降低65%,DAG完成时间降低了57%。

Modern production data processing and machine learning pipelines on the cloud are critical components for many cloud-based companies. These pipelines are typically composed of complex workflows represented by directed acyclic graphs (DAGs). Cloud environments are attractive to these workflows due to the wide range of choice with heterogeneous instances and prices that can provide the flexibility for different cost-performance needs. However, this flexibility also leads to the complexity of selecting the right resource configuration (e.g., instance type, resource demands) for each task in the DAG, while simultaneously scheduling the tasks with the selected resources to reach the optimal end-to-end performance and cost. These two decisions are often codependent resulting in an NP-hard scheduling optimization bottleneck. Existing solutions only focus solely on either problem and ignore the co-effect on the end-to-end optimum. We propose AGORA, a scheduler that considers both task-level resource allocation and execution for DAG workflows as a whole in heterogeneous cloud environments. AGORA first (1) studies the characteristics of the tasks from prior runs and gives predictions on resource configurations, and (2) automatically finds the best configuration with its corresponding schedules for the entire workflow with a cost-performance objective. We evaluate AGORA in a heterogeneous Amazon Web Services (AWS) cloud environment with multi-tenant workflows served by Airflow and demonstrate a performance improvement up to 45% and cost reduction up to 77% compared to state-of-the-art schedulers. In addition, we apply AGORA to a real-world production trace from Alibaba and show cost reduction of 65% and DAG completion time reduction of 57%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源