论文标题
流:高保真加速求解器,用于直接对可压缩湍流进行直接数值模拟
STREAmS: a high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flow
论文作者
论文摘要
我们提出流,这是一种用于大规模,平行直接直接数值模拟(DNS)的内部高保真求解器,可压缩湍流(GPU)。流是用Fortran 90语言编写的,其量身定制为执行规范可压缩墙壁的流动,即湍流平面通道,零压力梯度的湍流边界层和超音速倾斜冲击波/边界层相互作用。该求解器结合了最先进的数值算法,该算法专为应对与高速湍流解决方案相关的具有挑战性的问题,并且可以在各种马赫数中使用,从低的亚音速延伸到高音状态。 CUF自动内核的使用允许在GPU体系结构上轻松有效的移植,最大程度地将原始CPU代码的更改最小化,这也可以维护。我们讨论了基于主机和设备的重复阵列的内存分配策略,该策略仔细地最大程度地减少了内存使用量,使求解器适用于最新的GPU卡上的大规模计算。不同CPU和GPU架构之间的比较非常有利于后者,并在单个NVIDIA TESLA P100上执行求解器对应于使用大约330个Intel Knights降落CPU CPU核心。溪流显示出非常好的强伸缩性,本质上是理想的弱可伸缩性,直到2048 GPU,为真正的高雷诺数制度铺平了模拟的道路,可能在摩擦雷诺数$ re_re_τ> 10^4 $中。该求解器根据GPLV3许可发布开源,可在https://github.com/matteobernardini/streams上获得。
We present STREAmS, an in-house high-fidelity solver for large-scale, massively parallel direct numerical simulations (DNS) of compressible turbulent flows on graphical processing units (GPUs). STREAmS is written in the Fortran 90 language and it is tailored to carry out DNS of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interactions. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. The use of cuf automatic kernels allowed an easy and efficient porting on the GPU architecture minimizing the changes to the original CPU code, which is also maintained. We discuss a memory allocation strategy based on duplicated arrays for host and device which carefully minimizes the memory usage making the solver suitable for large scale computations on the latest GPU cards. Comparison between different CPUs and GPUs architectures strongly favor the latter, and executing the solver on a single NVIDIA Tesla P100 corresponds to using approximately 330 Intel Knights Landing CPU cores. STREAmS shows very good strong scalability and essentially ideal weak scalability up to 2048 GPUs, paving the way to simulations in the genuine high-Reynolds number regime, possibly at friction Reynolds number $Re_τ > 10^4$. The solver is released open source under GPLv3 license and is available at https://github.com/matteobernardini/STREAmS.