论文标题

Autotsmm:一个自动调整框架,用于在CPU上建立高性能高性矩阵矩阵乘法

AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs

论文作者

Li, Chendi, Jia, Haipeng, Cao, Hang, Yao, Jianyu, Shi, Boqian, Xiang, Chunyang, Sun, Jinbo, Lu, Pengqi, Zhang, Yunquan

论文摘要

近年来,具有非规范输入矩阵的一般矩阵矩阵乘法已被广泛用于深度学习等许多应用中,并引起了越来越多的关注。但是,常规实现不适合非规范矩阵矩阵乘法,很少有工作重点是优化CPU上的高质矩阵矩阵乘法。本文提出了一个自动调整框架Autotsmm,以构建高性能高性能的矩阵矩阵乘法。 AUTOTSMM在安装时间阶段中选择最佳内核,并在运行时阶段为预装式高和肤色的矩阵矩阵乘法生成执行计划。实验表明,与最先进的质子矩阵矩阵乘法相比,AUTOTSMM可以实现竞争性能。而且,它的表现优于所有常规的矩阵 - 矩阵乘法实现。

In recent years, general matrix-matrix multiplication with non-regular-shaped input matrices has been widely used in many applications like deep learning and has drawn more and more attention. However, conventional implementations are not suited for non-regular-shaped matrix-matrix multiplications, and few works focus on optimizing tall-and-skinny matrix-matrix multiplication on CPUs. This paper proposes an auto-tuning framework, AutoTSMM, to build high-performance tall-and-skinny matrix-matrix multiplication. AutoTSMM selects the optimal inner kernels in the install-time stage and generates an execution plan for the pre-pack tall-and-skinny matrix-matrix multiplication in the runtime stage. Experiments demonstrate that AutoTSMM achieves competitive performance comparing to state-of-the-art tall-and-skinny matrix-matrix multiplication. And, it outperforms all conventional matrix-matrix multiplication implementations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源