论文标题
Allreduce操作的概括
A Generalization of the Allreduce Operation
论文作者
论文摘要
Alleduce是最常用的MPI集体行动之一,因此其表现在过去几十年中引起了很多关注。开发了许多具有不同属性和目的的算法。我们提出了一种基于受障碍立方体数学启发的排列的新颖方法来描述,其中移动形成了一种称为组的数学结构。同样,置换组可以描述一组$ p $过程之间的循环通信模式。这种新方法允许构建广泛使用的Alleduce算法的概括,例如环,递归加倍和递归减半。使用开发的方法,我们构建了一种算法,该算法成功地解决了两次非功率的流程的众所周知问题,这些问题破坏了许多现有算法的性能。所提出的算法为任何数量的流程提供了一个通用解决方案,该过程的通信步骤在$ \ lceil \ log {p} \ rceil $之间的延迟 - 原始版本和$ 2 \ cdot \ lceil \ log \ log {p} \ rceil $之间的$ \ lceil \ log {p} \ rceil $之间提供了通用解决方案。
Allreduce is one of the most frequently used MPI collective operations, and thus its performance attracts much attention in the past decades. Many algorithms were developed with different properties and purposes. We present a novel approach to communication description based on the permutations inspired by the mathematics of a Rubik's cube where the moves form a mathematical structure called group. Similarly, cyclic communication patterns between a set of $P$ processes may be described by a permutation group. This new approach allows constructing a generalization of the widely used Allreduce algorithms such as Ring, Recursive Doubling and Recursive Halving. Using the developed approach we build an algorithm that successfully solves the well-known problem of the non-power-of-two number of processes which breaks down the performance of many existing algorithms. The proposed algorithm provides a general solution for any number of processes with the dynamically changing amount of communication steps between $\lceil \log{P} \rceil$ for the latency-optimal version and $2 \cdot \lceil \log{P} \rceil$ for the bandwidth-optimal case.