论文标题
GPU上的自适应SPMV/SPMSPV,用于不同稀疏的输入矢量
Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity
论文作者
论文摘要
尽管在现代硬件体系结构上优化稀疏矩阵和矢量乘法(SPMV)的性能的许多努力,但其稀疏对应物,稀疏矩阵和稀疏矢量乘法(SPMSPV)的作品很少,更不用说处理各种稀疏的输入量了。关键挑战是,根据稀疏度,数据分布和计算平台,SPMV/SPMSPV内核的最佳选择可能会有所不同,并且静态选择不足。在本文中,我们提出了一个自适应SPMV/SPMSPV框架,该框架可以在运行时自动在GPU上选择适当的SPMV/SPMSPV内核。基于对诸如计算模式,工作负载分布和写入策略等关键因素的系统分析,将八个候选SPMV/SPMSPV内核封装在框架中,以无缝的方式实现高性能。对基于机器学习的内核选择器进行了全面研究,以选择内核并根据精度和架空角度适应输入和硬件的品种。实验表明,在NVIDIA TESLA K40M,P100和V100 GPU上,自适应框架可以显着优于先前的先前最新应用。
Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication (SpMSpV), not to mention dealing with input vectors of varied sparsity. The key challenge is that depending on the sparsity levels, distribution of data, and compute platform, the optimal choice of SpMV/SpMSpV kernel can vary, and a static choice does not suffice. In this paper, we propose an adaptive SpMV/SpMSpV framework, which can automatically select the appropriate SpMV/SpMSpV kernel on GPUs for any sparse matrix and vector at the runtime. Based on systematic analysis on key factors such as computing pattern, workload distribution and write-back strategy, eight candidate SpMV/SpMSpV kernels are encapsulated into the framework to achieve high performance in a seamless manner. A comprehensive study on machine learning based kernel selector is performed to choose the kernel and adapt with the varieties of both the input and hardware from both accuracy and overhead perspectives. Experiments demonstrate that the adaptive framework can substantially outperform the previous state-of-the-art in real-world applications on NVIDIA Tesla K40m, P100 and V100 GPUs.