论文标题

深度学习应用的层次屋顶绩效分析

Hierarchical Roofline Performance Analysis for Deep Learning Applications

论文作者

Yang, Charlene, Wang, Yunsong, Farrell, Steven, Kurth, Thorsten, Williams, Samuel

论文摘要

本文提出了一种实用方法,用于收集对NVIDIA GPU进行层次屋顶分析所需的性能数据。它讨论了经验屋顶工具包的扩展,以更广泛地支持一系列数据精确量和张量核心支持,并引入了基于Nsight Compute的方法,以准确收集应用程序性能信息。该方法允许在NVIDIA GPU上的整个内存层次结构上进行自动化的机器表征和应用表征,并通过用于气候图像分割的复杂深度学习应用程序进行验证。我们分别在Tensorflow和Pytorch中使用了两个版本的代码,以证明该方法的使用和有效性。我们强调了该应用程序如何利用GPU上的计算和内存功能以及在两个深度学习框架中的实现和性能如何不同。

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Tensor Core support and introduces a Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology. We highlight how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differ in two deep learning frameworks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源