论文标题

在分布式内存模型中

Communication-Optimal Parallel Standard and Karatsuba Integer Multiplication in the Distributed Memory Model

论文作者

De Stefani, Lorenzo

论文摘要

我们向COPSIM提出了用于分布式内存设置的标准整数乘法的并行实现,并为分布式内存设置的Karatsuba快速整数乘法算法的并行实现。当使用$ \ Mathcal {P} $处理器时,每个处理器都配备了本地非共享内存,以计算$ n $ digits Integer数字的产品,在温和条件下,我们的算法实现了计算时间的最佳加速。也就是说,$ \ Mathcal {o} \ left(n^2/\ Mathcal {p} \ right)$ for Copsim和$ \ Mathcal {O} \ left(n^{\ log_2 3}/\ Mathcal {p} \ right)整个处理器所需的内存总量为$ \ Mathcal {O} \ left(n \右)$,即在存储输入值所需的最小空间的恒定因素。我们严格分析了所提出算法的输入/输出(I/O)成本。我们表明,它们的带宽成本(即,至少一个处理器发送或接收的存储单词数)偶然地匹配已知的I/O下限及其延迟(即,在Algorithm的关键执行路径中发送或接收的消息的数量)是在综合因子$ \ MATHCCAL中均无限于MATHCAL {O \ MATHCAL(o)。 \ Mathcal {p} \ right)$的$,已知I/O下限。因此,我们的算法在带宽成本方面渐近是最佳的,并且在延迟成本方面几乎渐近最佳。

We present COPSIM a parallel implementation of standard integer multiplication for the distributed memory setting, and COPK a parallel implementation of Karatsuba's fast integer multiplication algorithm for a distributed memory setting. When using $\mathcal{P}$ processors, each equipped with a local non-shared memory, to compute the product of tho $n$-digits integer numbers, under mild conditions, our algorithms achieve optimal speedup of the computational time. That is, $\mathcal{O}\left(n^2/\mathcal{P}\right)$ for COPSIM, and $\mathcal{O}\left(n^{\log_2 3}/\mathcal{P}\right)$ for COPK. The total amount of memory required across the processors is $\mathcal{O}\left(n\right)$, that is, within a constant factor of the minimum space required to store the input values. We rigorously analyze the Input/Output (I/O) cost of the proposed algorithms. We show that their bandwidth cost (i.e., the number of memory words sent or received by at least one processors) matches asymptotically corresponding known I/O lower bounds, and their latency (i.e., the number of messages sent or received in the algorithm's critical execution path) is asymptotically within a multiplicative factor $\mathcal{O}\left(\log^2_2 \mathcal{P}\right)$ of the corresponding known I/O lower bounds. Hence, our algorithms are asymptotically optimal with respect to the bandwidth cost and almost asymptotically optimal with respect to the latency cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源