论文标题

Cmacionize 2.0:一种新型的基于任务的蒙特卡洛辐射转移方法

CMacIonize 2.0: a novel task-based approach to Monte Carlo radiation transfer

论文作者

Vandenbroucke, Bert, Camps, Peter

论文摘要

(上下文)Monte Carlo辐射转移(MCRT)是一种广泛使用的技术,用于建模辐射与培养基之间的相互作用,并在天体物理建模中以及将这些模型与观测值进行比较时起着重要作用。 (目的)在这项工作中,我们提出了一种新颖的MCRT方法,该方法解决了传统MCRT算法的挑战性内存访问模式,这阻碍了MCRT模拟在现代硬件上具有复杂的内存体系结构的最佳性能。 (方法)我们将MCRT光子数据包生命周期重新制定为基于任务的算法,从而将计算分解为同时执行的小任务。光子数据包存储在中间缓冲区中,任务通过计算域的小部分传播光子数据包,在过程中将它们从一个缓冲区移至另一个缓冲区。 (结果)使用光电离MCRT代码CMACIONIZE 2.0中新算法的实现,我们表明,MCRT网格分为小零件会导致在光子数据包传播阶段的性能增长,这构成了MCRT algorithm的大部分。我们的新算法比同等的传统算法快2到4个因子,并且显示出高达30个线程的良好范围。我们简要讨论如何将新算法调整或扩展到其他天体物理MCRT应用。 (结论)我们表明,优化诸如MCRT等内存结合算法的内存访问模式可以产生显着的性能增长。

(Context) Monte Carlo radiative transfer (MCRT) is a widely used technique to model the interaction between radiation and a medium, and plays an important role in astrophysical modelling and when comparing those models with observations. (Aims) In this work, we present a novel approach to MCRT that addresses the challenging memory access patterns of traditional MCRT algorithms, which hinder optimal performance of MCRT simulations on modern hardware with a complex memory architecture. (Methods) We reformulate the MCRT photon packet life cycle as a task-based algorithm, whereby the computation is broken down into small tasks that are executed concurrently. Photon packets are stored in intermediate buffers, and tasks propagate photon packets through small parts of the computational domain, moving them from one buffer to another in the process. (Results) Using the implementation of the new algorithm in the photoionization MCRT code CMacIonize 2.0, we show that the decomposition of the MCRT grid into small parts leads to a significant performance gain during the photon packet propagation phase, which constitutes the bulk of an MCRT algorithm, as a result of better usage of memory caches. Our new algorithm is a factor 2 to 4 faster than an equivalent traditional algorithm and shows good strong scaling up to 30 threads. We briefly discuss how our new algorithm could be adjusted or extended to other astrophysical MCRT applications. (Conclusions) We show that optimising the memory access patterns of a memory-bound algorithm such as MCRT can yield significant performance gains.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源