论文标题
二进制叉模型的低深度平行算法没有原子
Low-Depth Parallel Algorithms for the Binary-Forking Model without Atomics
论文作者
论文摘要
二进制装饰模型是平行计算模型,由Blelloch等人正式定义。最近,线程可以递归和异步分叉并发的子线。该模型会产生$θ(\ log n)$的成本,以产卵或同步$ n $任务或线程。二进制装饰模型实际捕获了使用现代多线程编程语言在多核心共享计算机上实现的并行算法的性能。相比之下,经过广泛研究的理论婴儿车模型不考虑产卵和同步线的成本,因此,在二进制装饰模型中,实现婴儿车模型中达到最佳性能界限的算法可能不是最佳的。通常,需要重新设计算法,以在二进制架模型中实现最佳性能界限,而非恒定同步成本使任务具有挑战性。 尽管二进制叉模型允许使用原子{\ em test-set}(ts)指令,以减少某些同步开销,假设此类说明的可用性使硬件上有更强大的要求,并且可能会限制使用它们使用它们的算法的可移植性。在本文中,我们避免在算法中使用锁和原子说明,除非可能在运行时系统实现的联接操作中。 在本文中,我们在没有原子的二进制架架模型中设计有效的并行算法,这些算法对于三个基本问题:Strassen的(和Strassen样)矩阵乘法(MM),基于比较的分类和快速的傅立叶变换(FFT)。我们的所有结果都改善了有或没有原子的二进制叉模型中相应问题的已知结果。
The binary-forking model is a parallel computation model, formally defined by Blelloch et al. very recently, in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of $Θ(\log n)$ to spawn or synchronize $n$ tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. Though the binary-forking model allows the use of atomic {\em test-and-set} (TS) instructions to reduce some synchronization overhead, assuming the availability of such instructions puts a stronger requirement on the hardware and may limit the portability of the algorithms using them. In this paper, we avoid the use of locks and atomic instructions in our algorithms except possibly inside the join operation which is implemented by the runtime system. In this paper, we design efficient parallel algorithms in the binary-forking model without atomics for three fundamental problems: Strassen's (and Strassen-like) matrix multiplication (MM), comparison-based sorting, and the Fast Fourier Transform (FFT). All our results improve over known results for the corresponding problem in the binary-forking model both with and without atomics.