Sparsert：加速对GPU的非结构化稀疏性进行深度学习推断

论文标题

Sparsert：加速对GPU的非结构化稀疏性进行深度学习推断

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference

论文作者

Wang, Ziheng

论文摘要

近年来，在深度神经网络修剪和压缩方面进行了大量研究。早期方法单独修剪体重。但是，很难利用GPU等现代硬件上产生的非结构化稀疏模式。结果，在体重中施加稀疏结构的修剪策略变得越来越流行。但是，这些结构化的修剪方法通常会导致准确性损失更高，而不是非结构化的修剪。在本文中，我们提出了Sparsert，这是一种利用非结构化稀疏性的代码生成器，以加速对GPU的深度学习推断中稀疏线性代数操作。对于1x1的卷积和完全连接的层，我们在90％的稀疏度处表现出3.4倍的加速度的几何平均值，而在深度学习中对数百个测试用例进行评估时，在90％的稀疏度处于90％的稀疏性。对于稀疏的3x3卷积，我们在Resnet-50中的用例中显示了超过5倍的速度。

In recent years, there has been a flurry of research in deep neural network pruning and compression. Early approaches prune weights individually. However, it is difficult to take advantage of the resulting unstructured sparsity patterns on modern hardware like GPUs. As a result, pruning strategies which impose sparsity structures in the weights have become more popular. However,these structured pruning approaches typically lead to higher losses in accuracy than unstructured pruning. In this paper, we present SparseRT, a code generator that leverage unstructured sparsity to accelerate sparse linear algebra operations in deep learning inference on GPUs. For 1x1 convolutions and fully connected layers, we demonstrate geometric mean of speedups of 3.4x over the equivalent dense computation at 90% sparsity and 5.4x at 95% sparsity when evaluated on hundreds of test cases in deep learning. For sparse 3x3 convolutions, we show speedups of over 5x on use cases in ResNet-50.

下载PDF全文

下载文献需遵守相关版权规定

论文标题