论文标题
多个张量 - 矩阵计算的通信下限和最佳算法
Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation
论文作者
论文摘要
多张量量 - 矩阵(Multi-TTM)是用于计算和使用Tucker Tensor分解的算法中的关键计算,该计算经常用于多维数据分析。我们建立通信下限,以确定并行执行多TTM计算所需的数据运动。证明的症结在于分析解决受约束的非线性优化问题。我们还提出了一种并行算法来执行此计算,该算法将处理器组织成具有输入张量的两倍的逻辑网格。我们表明,通过正确选择网格维度,该算法的通信成本达到了下限,因此是最佳的通信。最后,我们表明,与表达计算作为一系列张量 - times-matrix操作的直接方法相比,我们的算法可以显着降低通信。
Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations.