论文标题

Flups:一个基于傅立叶的无限泊松求解器库

FLUPS: a Fourier-based Library of Unbounded Poisson Solvers

论文作者

Caprace, Denis-Gabriel, Gillis, Thomas, Chatelain, Philippe

论文摘要

介绍了一个基于傅立叶的泊松泊松求解器(FLUP)的傅立叶库(用于2D和3D均匀分布式网格)。它旨在处理具有均匀分辨率的矩形域上泊松方程的周期性,对称性,半无调和完全无界边界条件的所有可能组合。 Flups利用3D傅立叶变换的专用实现,以快速和记忆有效的方式使用Green的功能来求解泊松方程。可以选择使用几个Green的功能,可带有明确的正则化,光谱截断或使用Lattice Green的功能,并提供经过验证的收敛顺序,从2到光谱。该算法取决于FFTW库执行1D变换,而消息传递接口(MPI)通信可以使数据在内存中进行所需的重新映射。对于后一个操作,第一个可用的实施措施求助于标准的全程例程。然而,在大多数情况下,尤其是在利用OpenMP的共享内存并行性的同时,证明了第二个具有非阻滞和持久点对点通信的实现,尤其是在大多数情况下更有效。该算法的可伸缩性针对大量平行结构,最多可达73 720个核心。用三个不同的超级计算机获得的结果表明,对于典型的问题,当核心数量乘以16时,弱效率保持在40 \%以上,强效率高于30%。这些数字比第三方​​3D快速傅立叶变换(FFT)工具的预期要好得多,该工具的执行时间也平均得出20%。

A Fourier-based Library of Unbounded Poisson Solvers (FLUPS) for 2D and 3D homogeneous distributed grids is presented. It is designed to handle every possible combination of periodic, symmetric, semi-unbounded and fully unbounded boundary conditions for the Poisson equation on rectangular domains with uniform resolution. FLUPS leverages a dedicated implementation of 3D Fourier transforms to solve the Poisson equation using Green's functions, in a fast and memory-efficient way. Several Green's functions are available, optionally with explicit regularization, spectral truncation, or using lattice Green's functions, and provide verified convergence orders from 2 to spectral-like. The algorithm depends on the FFTW library to perform 1D transforms, while Message Passing Interface (MPI) communications enable the required remapping of data in memory. For the latter operation, a first available implementation resorts to the standard all-to-all routines. A second implementation, featuring non-blocking and persistent point-to-point communications, is however shown to be more efficient in a majority of cases and especially while taking advantage of the shared memory parallelism with OpenMP. The scalability of the algorithm, aimed at massively parallel architectures, is demonstrated up to 73 720 cores. The results obtained with three different supercomputers show that the weak efficiency remains above 40\% and the strong efficiency above 30% when the number of cores is multiplied by 16, for typical problems. These figures are slightly better than those expected from a third party 3D Fast Fourier Transform (FFT) tool, with which a 20% longer execution time was also measured on average.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源