论文标题

AVX-512矢量化对二级订单二的奇异值分解的批处理计算

Batched computation of the singular value decompositions of order two by the AVX-512 vectorization

论文作者

Novaković, Vedran

论文摘要

在本文中,提出了一种用于同时计算八个奇数分解(SVD,$ a =uσv^{\ ast} $)的矢量化算法。该算法扩展到一个任意长度$ n $的一批矩阵,例如,在平行的Kogbetliantz算法的an灭部分中,用于订单$ 2N $的Square Matrix的SVD。首先得出了两个阶级矩阵的SVD算法。在大多数情况下,它会缩放输入矩阵$ a $,以使其单数值$σ_{II} $在其元素是有限的情况下都无法溢出,然后计算缩放矩阵的URV分解,然后是非阴性上层中间因子的SVD。然后引入了批处理的矢量友好数据布局,其中每个输入的相同索引元素和输出矩阵形成向量,并描述了算法对向量的步骤。然后,显示矢量化方法比隔离处理每个矩阵的速度快三倍,同时,对于$ 2 \ times 2 $ svd而言,与直接方法相比,准确性略有提高。

In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=UΣV^{\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $Σ_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\times 2$ SVD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源