论文标题
加速基于多移民的分层科学数据重构GPU
Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs
论文作者
论文摘要
科学数据的快速增长以及计算速度和I/O带宽之间的差距的扩大使得存储和共享由科学模拟产生的所有数据变得越来越不可行。取而代之的是,我们需要减少数据量的方法:理想情况下,可以自适应地扩展数据量的方法,以便在不同情况下对性能和忠实折衷进行协商。基于多机构的层次数据表示形式有望作为解决此问题的解决方案,从而可以在不同的保真度之间进行灵活的转换,以便例如在高保真度以高保真度创建数据,然后通过逻辑上简单且数学上的声音操作转移或存储在较低的忠诚度中。但是,到目前为止,由于创建,访问,减少和以其他方式在此类表示形式上运行的相对较高的成本,有效地使用了此类表示形式。我们在这里描述了高度优化的数据重构内核,用于GPU加速器,以有效地创建和操纵基于多方的层次结构形式的数据。我们证明,我们优化的设计可以在峰顶超级计算机的1024个节点上实现多达250 TB/S的汇总数据重构吞吐量(占理论峰值的83%)。我们通过将其应用于大规模的科学可视化工作流和mgard损失压缩软件来展示我们的优化设计。
Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth make it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigrid-based hierarchical data representations hold promise as a solution to this problem, allowing for flexible conversion between different fidelities so that, for example, data can be created at high fidelity and then transferred or stored at lower fidelity via logically simple and mathematically sound operations. However, the effective use of such representations has been hindered until now by the relatively high costs of creating, accessing, reducing, and otherwise operating on such representations. We describe here highly optimized data refactoring kernels for GPU accelerators that enable efficient creation and manipulation of data in multigrid-based hierarchical forms. We demonstrate that our optimized design can achieve up to 250 TB/s aggregated data refactoring throughput -- 83% of theoretical peak -- on 1024 nodes of the Summit supercomputer. We showcase our optimized design by applying it to a large-scale scientific visualization workflow and the MGARD lossy compression software.