论文标题

无服务器查询处理中的资源分配

Resource Allocation in Serverless Query Processing

论文作者

Kassing, Simon, Müller, Ingo, Alonso, Gustavo

论文摘要

数据湖持有越来越多的冷数据,这些数据很少访问,但需要交互式响应时间。无服务器功能被视为解决此用例的一种方法,因为它们为维护(并支付)固定基础架构提供了一种吸引人的替代方案。最近的研究分析了无服务器对数据处理的潜力。在本文中,我们通过研究无服务器资源分配的问题(功能的数字和大小)来扩展此类工作。我们制定了一个通用模型,以大致估算完成时间和财务成本,我们将其应用于使用咨询工具来增强现有的无服务器数据处理系统,该工具自动识别出达到良好平衡的配置 - 我们将其定义为靠近其Pareto Frontier的“膝盖”。该模型考虑了无服务器的关键方面:启动,计算,网络传输和开销作为输入大小和中间结果交换的函数。使用(Micro)基准和TPC-H的一部分,我们表明该顾问能够确定用户所需的配置。此外,我们识别并讨论了无服务器影响效率的数据处理的几个方面。通过使用自动化工具来配置资源,可以降低使用无服务器进行数据处理的障碍,并且可以通过使用更最佳的分配来扩展其具有成本效益的狭窄窗口,而不必过度进行设计。

Data lakes hold a growing amount of cold data that is infrequently accessed, yet require interactive response times. Serverless functions are seen as a way to address this use case since they offer an appealing alternative to maintaining (and paying for) a fixed infrastructure. Recent research has analyzed the potential of serverless for data processing. In this paper, we expand on such work by looking into the question of serverless resource allocation to data processing tasks (number and size of the functions). We formulate a general model to roughly estimate completion time and financial cost, which we apply to augment an existing serverless data processing system with an advisory tool that automatically identifies configurations striking a good balance -- which we define as being close to the "knee" of their Pareto frontier. The model takes into account key aspects of serverless: start-up, computation, network transfers, and overhead as a function of the input sizes and intermediate result exchanges. Using (micro)benchmarks and parts of TPC-H, we show that this advisor is capable of pinpointing configurations desirable to the user. Moreover, we identify and discuss several aspects of data processing on serverless affecting efficiency. By using an automated tool to configure the resources, the barrier to using serverless for data processing is lowered and the narrow window where it is cost effective can be expanded by using a more optimal allocation instead of having to over-provision the design.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源