通过GPU运行时分析和消除尾巴效果，朝着潜伏期感知的DNN优化

论文标题

通过GPU运行时分析和消除尾巴效果，朝着潜伏期感知的DNN优化

Towards Latency-aware DNN Optimization with GPU Runtime Analysis and Tail Effect Elimination

论文作者

Yu, Fuxun, Xu, Zirui, Shen, Tong, Stamoulis, Dimitrios, Shangguan, Longfei, Wang, Di, Madhok, Rishi, Zhao, Chunshui, Li, Xin, Karianakis, Nikolaos, Lymberopoulos, Dimitrios, Li, Ang, Liu, ChenChen, Chen, Yiran, Chen, Xiang

论文摘要

尽管最先进的DNN表现出色，但计算成本的增加使它们在满足实时延迟和准确性要求方面非常具有挑战性。尽管DNN运行时潜伏期是由模型属性（例如，体系结构，操作），硬件属性（例如，利用率，吞吐量）决定的，更重要的是，这两个现有方法之间的有效映射仅着眼于优化模型属性，例如降低FLOPS和忽略DNN模型和硬件属性之间的不匹配。在这项工作中，我们表明，各种DNN计算工作负载与GPU容量之间的不匹配可能会导致闲置的GPU尾巴效应，从而导致GPU的利用不足和低吞吐量。结果，减少的拖船不能带来有效的潜伏期减少，这会导致次优准性与延迟权衡。在此激励的情况下，我们提出了一种GPU运行时感知的DNN优化方法，以在GPU平台上自适应地消除这种GPU尾巴效应。我们的方法可以应用于现有的SOTA DNN优化方法，以实现更好的延迟和准确性权衡。实验显示了11％-27％的潜伏期降低和2.5％-4.0％的准确性提高了几种SOTA DNN修剪和NAS方法

Despite the superb performance of State-Of-The-Art (SOTA) DNNs, the increasing computational cost makes them very challenging to meet real-time latency and accuracy requirements. Although DNN runtime latency is dictated by model property (e.g., architecture, operations), hardware property (e.g., utilization, throughput), and more importantly, the effective mapping between these two, many existing approaches focus only on optimizing model property such as FLOPS reduction and overlook the mismatch between DNN model and hardware properties. In this work, we show that the mismatch between the varied DNN computation workloads and GPU capacity can cause the idle GPU tail effect, leading to GPU under-utilization and low throughput. As a result, the FLOPs reduction cannot bring effective latency reduction, which causes sub-optimal accuracy versus latency trade-offs. Motivated by this, we propose a GPU runtime-aware DNN optimization methodology to eliminate such GPU tail effect adaptively on GPU platforms. Our methodology can be applied on top of existing SOTA DNN optimization approaches to achieve better latency and accuracy trade-offs. Experiments show 11%-27% latency reduction and 2.5%-4.0% accuracy improvement over several SOTA DNN pruning and NAS methods, respectively

下载PDF全文

下载文献需遵守相关版权规定

论文标题