图形处理单元（GPU）的数据驱动的频率缩放方法（用于截止日期感知的能源节能计划

论文标题

图形处理单元（GPU）的数据驱动的频率缩放方法（用于截止日期感知的能源节能计划

A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs)

论文作者

Ilager, Shashikant, Muralidhar, Rajeev, Rammohanrao, Kotagiri, Buyya, Rajkumar

论文摘要

诸如云计算之类的现代计算范式越来越多地采用GPU来提高其计算功能，这主要是由于AI/ML/深度学习工作负载的异质性质。但是，GPU的能耗是一个关键问题。动态电压频率缩放（DVFS）是一种广泛使用的技术，可降低GPU的动态功率。但是，由于应用程序的运行时性能特征，能量和执行时间之间的复杂非线性关系，因此为基本性能要求配置最佳时钟频率是一项非平凡的任务。当不同的应用程序在相似的时钟设置中行为独特时，它变得更具挑战性。简单的分析解决方案和标准的GPU频率缩放启发式方法无法捕获这些复杂性，并适当地扩展了频率。在这方面，我们通过通过不同的时钟设置预测给定应用程序的功率和执行时间来提出一种数据驱动的频率缩放技术。我们从应用程序分析中收集数据并训练模型以准确预测结果。提出的解决方案是通用的，可以轻松地扩展到各种工作负载和GPU架构。此外，使用预测模型使用此频率缩放，我们提出了一种截止日期感知的应用程序调度算法，以减少能耗，同时按时完成截止日期。我们使用多个基准应用对NVIDIA GPU进行了真正的广泛实验。实验结果表明，我们的预测模型具有很高的精度，而能量和时间预测的平均RMSE值分别为0.38和0.05。此外，与基线政策相比，计划算法消耗的能量减少了15.07％。

Modern computing paradigms, such as cloud computing, are increasingly adopting GPUs to boost their computing capabilities primarily due to the heterogeneous nature of AI/ML/deep learning workloads. However, the energy consumption of GPUs is a critical problem. Dynamic Voltage Frequency Scaling (DVFS) is a widely used technique to reduce the dynamic power of GPUs. Yet, configuring the optimal clock frequency for essential performance requirements is a non-trivial task due to the complex nonlinear relationship between the application's runtime performance characteristics, energy, and execution time. It becomes more challenging when different applications behave distinctively with similar clock settings. Simple analytical solutions and standard GPU frequency scaling heuristics fail to capture these intricacies and scale the frequencies appropriately. In this regard, we propose a data-driven frequency scaling technique by predicting the power and execution time of a given application over different clock settings. We collect the data from application profiling and train the models to predict the outcome accurately. The proposed solution is generic and can be easily extended to different kinds of workloads and GPU architectures. Furthermore, using this frequency scaling by prediction models, we present a deadline-aware application scheduling algorithm to reduce energy consumption while simultaneously meeting their deadlines. We conduct real extensive experiments on NVIDIA GPUs using several benchmark applications. The experiment results have shown that our prediction models have high accuracy with the average RMSE values of 0.38 and 0.05 for energy and time prediction, respectively. Also, the scheduling algorithm consumes 15.07% less energy as compared to the baseline policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题