有效的GPU线程映射在嵌入式2D分形上

论文标题

有效的GPU线程映射在嵌入式2D分形上

Efficient GPU Thread Mapping on Embedded 2D Fractals

论文作者

Navarro, Cristóbal A., Quezada, Felipe A., Hitschfeld, Nancy, Vega, Raimundo, Bustos, Benjamin

论文摘要

这项工作提出了一种新方法，将GPU线映射到一个离散嵌入的2D分形家族中。一个块空间映射$λ：\ Mathbb {z} _ {\ Mathbb {e}}}}^{2} \ mapsto \ Mathbb {z} _ {\ Mathbb {f}}}^{2 $ \ mathbb {f} $，在$ \ mathcal {o}（\ log_2 \ log_2（n））$中映射，使用时间不超过$ \ nathcal {o}（n^\ mathbb {h}）$带有$ \ m athbb {与边界盒（BB）方法相比，$λ（ω）$在平行空间中提供了次指数改进，并且单调增加了速度$ n \ ge n_0 $。 Sierpinski垫片分形用作特定的案例研究，实验性能结果表明，$λ（ω）$在边界盒方法上达到了高达$ 9 \ times $速度的加速。还为现代GPU提出了基于张核核的实现$λ（ω）$，可提供多达$ \ sim40 \％$ $的额外性能。在这项工作中获得的结果表明，在分形域上进行有效的GPU线程映射可以显着改善与这种类型几何形状一起使用的几种应用程序的性能。

This work proposes a new approach for mapping GPU threads onto a family of discrete embedded 2D fractals. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads with $\mathbb{H}$ being the Hausdorff dimension of the fractal, making it parallel space efficient. When compared to a bounding-box (BB) approach, $λ(ω)$ offers a sub-exponential improvement in parallel space and a monotonically increasing speedup $n \ge n_0$. The Sierpinski gasket fractal is used as a particular case study and the experimental performance results show that $λ(ω)$ reaches up to $9\times$ of speedup over the bounding-box approach. A tensor-core based implementation of $λ(ω)$ is also proposed for modern GPUs, providing up to $\sim40\%$ of extra performance. The results obtained in this work show that doing efficient GPU thread mapping on fractal domains can significantly improve the performance of several applications that work with this type of geometry.

下载PDF全文

下载文献需遵守相关版权规定

论文标题