论文标题
最佳二进制搜索树问题的基于四拆下的粗粒粒子并行算法
Four-splitting based coarse-grained multicomputer parallel algorithm for the optimal binary search tree problem
论文作者
论文摘要
本文使用四分解技术提出了基于粗粒粒的多机计算机(CGM)模型的并行解决方案,以解决最佳的二进制搜索树问题。 Knuth的众所周知的顺序算法在$ \ Mathcal {O} \ left(n^2 \右)中解决了此问题,其中$ n $是用于构建最佳二进制搜索树的密钥数。为了使该算法在CGM模型上并行,已经提出了不规则的分区技术,该技术包括将依赖图细分为可变大小的子图(或块),以解决最小化通信巡回赛数量和平衡处理器负载的权衡。但是,这项技术会诱导处理器的较高潜伏期(这是全球通信时间的大部分时间),因为改变块的尺寸并不能在需要的数据可用后立即开始评估某些块。本文提出的四分解技术通过评估一个块作为四个子块的一系列计算和通信步骤来解决这一缺点。此基于CGM的并行解决方案需要$ \ MATHCAL(n^2/\ sqrt {p} \ right)$使用$ \ Mathcal {o} \ left(k \ sqrt {p} \ right)$ communence $ p $是$ k $ k $ kives sige是$ kives ness的数字,$ s size是数字的数字。 An experimental study conducted to evaluate the performance of this CGM-based parallel solution showed that compared to the solution based on the irregular partitioning technique where the speedup factor is up to $\times$10.39 on one hundred and twenty-eight processors with 40960 keys when $k = 2$, the speedup factor of this solution is up to $\times$13.12 and rises up to $\times$14.93 when $k = 5$.
This paper presents a parallel solution based on the coarse-grained multicomputer (CGM) model using the four-splitting technique to solve the optimal binary search tree problem. The well-known sequential algorithm of Knuth solves this problem in $\mathcal{O}\left(n^2\right)$ time and space, where $n$ is the number of keys used to build the optimal binary search tree. To parallelize this algorithm on the CGM model, the irregular partitioning technique, consisting in subdividing the dependency graph into subgraphs (or blocks) of variable size, has been proposed to tackle the trade-off of minimizing the number of communication rounds and balancing the load of processors. This technique however induces a high latency time of processors (which accounts for most of the global communication time) because varying the blocks' sizes does not enable them to start evaluating some blocks as soon as the data they need are available. The four-splitting technique proposed in this paper solves this shortcoming by evaluating a block as a sequence of computation and communication steps of four subblocks. This CGM-based parallel solution requires $\mathcal{O}\left(n^2/\sqrt{p} \right)$ execution time with $\mathcal{O}\left( k \sqrt{p}\right)$ communication rounds, where $p$ is the number of processors and $k$ is the number of times the size of blocks is subdivided. An experimental study conducted to evaluate the performance of this CGM-based parallel solution showed that compared to the solution based on the irregular partitioning technique where the speedup factor is up to $\times$10.39 on one hundred and twenty-eight processors with 40960 keys when $k = 2$, the speedup factor of this solution is up to $\times$13.12 and rises up to $\times$14.93 when $k = 5$.