论文标题
基于AI的资源分配:在无服务器环境中自适应自动缩放的加强学习
AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments
论文作者
论文摘要
近年来,无服务器计算已成为一种引人注目的云计算模型的新范式。它承诺大规模和低成本的用户服务,同时消除了对基础架构管理的需求。在云提供商方面,需要灵活的资源管理来满足波动的需求。可以通过自动化的供应和资源剥夺来启用它。商业和开源无服务器计算平台中的一种常见方法是基于工作负载的自动缩放,其中指定的算法根据传入请求的数量计算量表实例。在最近进化的无服务器框架中,提出了基于请求的策略,其中算法通过配置的最大请求来扩展资源,该请求可以通过实例(即所谓的并发)进行处理。正如我们在基线实验中显示的那样,这种预定义的并发级别可以强烈影响无服务器应用程序的性能。但是,确定产生最高服务质量的并发配置是由于各种因素,例如不同的工作量和复杂的基础架构特征,影响吞吐量和潜伏期。尽管已经对智能技术进行了大量研究,以优化虚拟机配置的自动缩放,但在无服务器计算领域尚未讨论此主题。因此,我们调查了增强学习方法的适用性,该方法已在动态虚拟机配置上证明,该方法已在无服务器框架中基于请求的自动缩放。我们的结果表明,在有限的迭代次数中,我们提出的模型将学习每个工作负载的有效缩放策略,从而改善了与默认的自动缩放配置相比的性能。
Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider side, flexible resource management is required to meet fluctuating demand. It can be enabled through automated provisioning and deprovisioning of resources. A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling, where a designated algorithm scales instances according to the number of incoming requests. In the recently evolving serverless framework Knative a request-based policy is proposed, where the algorithm scales resources by a configured maximum number of requests that can be processed in parallel per instance, the so-called concurrency. As we show in a baseline experiment, this predefined concurrency level can strongly influence the performance of a serverless application. However, identifying the concurrency configuration that yields the highest possible quality of service is a challenging task due to various factors, e.g. varying workload and complex infrastructure characteristics, influencing throughput and latency. While there has been considerable research into intelligent techniques for optimizing auto-scaling for virtual machine provisioning, this topic has not yet been discussed in the area of serverless computing. For this reason, we investigate the applicability of a reinforcement learning approach, which has been proven on dynamic virtual machine provisioning, to request-based auto-scaling in a serverless framework. Our results show that within a limited number of iterations our proposed model learns an effective scaling policy per workload, improving the performance compared to the default auto-scaling configuration.