揭开HPC科学应用程序在基于NVM的内存系统上的性能

论文标题

揭开HPC科学应用程序在基于NVM的内存系统上的性能

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems

论文作者

Peng, Ivy, Wu, Kai, Ren, Jie, Li, Dong, Gokhale, Maya

论文摘要

高密度字节可调的非易失性记忆（NVM）的出现有望加速数据和计算密集型应用。当前的NVM技术的性能低于DRAM，因此通常与DRAM配对在异质的主内存中。最近，可提供字节 - 可调的NVM硬件。这项工作及时评估了基于NVM的主内存中“七个矮人”的代表性HPC应用程序。我们的结果量化了DRAM-cached-NVM在加速HPC应用程序中的有效性，并使能够超出DRAM能力的大问题。在未经间隔的NVM上，HPC应用表现出三个层次的性能敏感性，即不敏感，缩放和瓶颈。我们将写入节流和并发控制视为优化应用程序的优先级。我们强调，并发变化可能会对应用程序中的读写访问和写入访问有所不同。基于这些发现，我们探讨了两种优化方法。首先，我们提供了一个预测模型，该模型使用来自一小组配置的数据集来估算各种并发和数据大小的性能，以避免在配置空间中进行详尽的搜索。其次，我们证明，在未经卫生的NVM上的写入数据放置可以实现$ 2 $ x的性能提高，而DRAM使用率减少了60％。

The emergence of high-density byte-addressable non-volatile memory (NVM) is promising to accelerate data- and compute-intensive applications. Current NVM technologies have lower performance than DRAM and, thus, are often paired with DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware becomes available. This work provides a timely evaluation of representative HPC applications from the "Seven Dwarfs" on NVM-based main memory. Our results quantify the effectiveness of DRAM-cached-NVM for accelerating HPC applications and enabling large problems beyond the DRAM capacity. On uncached-NVM, HPC applications exhibit three tiers of performance sensitivity, i.e., insensitive, scaled, and bottlenecked. We identify write throttling and concurrency control as the priorities in optimizing applications. We highlight that concurrency change may have a diverging effect on read and write accesses in applications. Based on these findings, we explore two optimization approaches. First, we provide a prediction model that uses datasets from a small set of configurations to estimate performance at various concurrency and data sizes to avoid exhaustive search in the configuration space. Second, we demonstrate that write-aware data placement on uncached-NVM could achieve $2$x performance improvement with a 60% reduction in DRAM usage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题