论文标题
基准测试线性代数的张力和pytorch的意识
Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch
论文作者
论文摘要
线性代数操作在机器学习中无处不在,形成了主要的性能瓶颈。高性能计算社区在开发特定于建筑的优化内核(例如Blas和Lapack库提供的核心)方面投入了大量努力,以加快线性代数操作。但是,最终用户逐渐不太可能直接使用上述内核进行易用的错误和耗时的过程。取而代之的是,促进机器学习应用程序开发的Tensorflow(TF)和Pytorch(Pyt)等框架变得越来越流行。尽管此类框架链接到Blas和Lapack,但尚不清楚它们是否利用线性代数知识来加快计算。因此,在本文中,我们开发基准来研究TF和PYT的线性代数优化功能。我们的分析表明,仍然缺少许多线性代数优化。例如,通过应用分配定律来减少标量操作的数量,并自动识别矩阵链的最佳括号化。在这项工作中,我们专注于TF和PYT中的线性代数计算。我们俩都为增强绩效增强的机会带来了框架开发人员的利益,并为最终用户提供有关如何实现绩效提高的指南。
Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks. The High-Performance Computing community invests significant effort in the development of architecture-specific optimized kernels, such as those provided by the BLAS and LAPACK libraries, to speed up linear algebra operations. However, end users are progressively less likely to go through the error prone and time-consuming process of directly using said kernels; instead, frameworks such as TensorFlow (TF) and PyTorch (PyT), which facilitate the development of machine learning applications, are becoming more and more popular. Although such frameworks link to BLAS and LAPACK, it is not clear whether or not they make use of linear algebra knowledge to speed up computations. For this reason, in this paper we develop benchmarks to investigate the linear algebra optimization capabilities of TF and PyT. Our analyses reveal that a number of linear algebra optimizations are still missing; for instance, reducing the number of scalar operations by applying the distributive law, and automatically identifying the optimal parenthesization of a matrix chain. In this work, we focus on linear algebra computations in TF and PyT; we both expose opportunities for performance enhancement to the benefit of the developers of the frameworks and provide end users with guidelines on how to achieve performance gains.