论文标题

Funcipe:一个无线电话的无服务器框架,用于对深度学习模型进行快速且具有成本效益的培训

FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models

论文作者

Liu, Yunzhuo, Jiang, Bo, Guo, Tian, Huang, Zimeng, Ma, Wenhao, Wang, Xinbing, Zhou, Chenghu

论文摘要

云中的训练深度学习(DL)模型已成为一种规范。随着无服务器计算的出现及其对实际付费定价和可扩展性的好处,系统研究人员最近开始为基于无线服务的培训提供支持。但是,当今无服务器基础架构和DL模型对内存和带宽的爆炸性要求的资源限制阻碍了在无服务器平台上训练DL模型的能力。本文介绍了Funcipe,这是一个新颖的管道训练框架,专门为无服务器平台设计,可快速,低成本训练DL模型。 FunCpipe的设计是通过关键见解设计的,即可以利用模型分区来弥合无服务器功能容量和DL培训需求之间的内存和带宽差距。从概念上讲,我们必须回答几个设计问题,包括如何对模型进行分区,配置每个无服务器功能以及利用每个函数的上行链路/下行链路带宽。特别是,我们针对无服务器环境量身定制了微分调度策略,该策略是后续优化的基础。我们的混合二次编程公式自动并同时配置无服务器资源和分区模型,以适合资源约束。最后,我们通过新颖的管道散射量算法提高了基于存储的同步的带宽效率。我们在两个流行的无云平台上实施了funcipe,并表明它可以节省7%-77%的成本和1.3 x-2.2倍的速度,而不是最先进的基于无服务器的框架。

Training deep learning (DL) models in the cloud has become a norm. With the emergence of serverless computing and its benefits of true pay-as-you-go pricing and scalability, systems researchers have recently started to provide support for serverless-based training. However, the ability to train DL models on serverless platforms is hindered by the resource limitations of today's serverless infrastructure and DL models' explosive requirement for memory and bandwidth. This paper describes FuncPipe, a novel pipelined training framework specifically designed for serverless platforms that enable fast and low-cost training of DL models. FuncPipe is designed with the key insight that model partitioning can be leveraged to bridge both memory and bandwidth gaps between the capacity of serverless functions and the requirement of DL training. Conceptually simple, we have to answer several design questions, including how to partition the model, configure each serverless function, and exploit each function's uplink/downlink bandwidth. In particular, we tailor a micro-batch scheduling policy for the serverless environment, which serves as the basis for the subsequent optimization. Our Mixed-Integer Quadratic Programming formulation automatically and simultaneously configures serverless resources and partitions models to fit within the resource constraints. Lastly, we improve the bandwidth efficiency of storage-based synchronization with a novel pipelined scatter-reduce algorithm. We implement FuncPipe on two popular cloud serverless platforms and show that it achieves 7%-77% cost savings and 1.3X-2.2X speedup compared to state-of-the-art serverless-based frameworks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源