分解的机器学习，用于大规模平行异质物理模拟的性能建模

论文标题

分解的机器学习，用于大规模平行异质物理模拟的性能建模

Factorized Machine Learning for Performance Modeling of Massively Parallel Heterogeneous Physical Simulations

论文作者

Oskooi, Ardavan, Hogan, Christopher, Hammond, Alec M., Reid, M. T. Homer, Johnson, Steven G.

论文摘要

我们展示了在基于云的MPI簇上运行的复杂，多参数，大量平行的，异质物理的模拟的神经网络运行时预测。由于单个模拟是如此昂贵，因此尽管在空间域中每个点的物理学的输入空间可能很大，但在有限的数据集上训练网络至关重要。我们使用两部分策略来实现这一目标。首先，我们使用从小型模拟提取的回归系数执行数据驱动的静态负载平衡，这既可以提高并行性能，又可以降低运行时对异质物理学精确空间布局的依赖性。其次，我们将这些负载平衡模拟的执行时间分为计算和通信，从每个项中考虑粗渐近尺度，并训练其剩余因子系数的神经网。该策略是针对Meep实施的，Meep是一种流行而复杂的开源电动力学仿真软件包，并通过已发表的工程模型绘制的异质模拟进行了验证。

We demonstrate neural-network runtime prediction for complex, many-parameter, massively parallel, heterogeneous-physics simulations running on cloud-based MPI clusters. Because individual simulations are so expensive, it is crucial to train the network on a limited dataset despite the potentially large input space of the physics at each point in the spatial domain. We achieve this using a two-part strategy. First, we perform data-driven static load balancing using regression coefficients extracted from small simulations, which both improves parallel performance and reduces the dependency of the runtime on the precise spatial layout of the heterogeneous physics. Second, we divide the execution time of these load-balanced simulations into computation and communication, factoring crude asymptotic scalings out of each term, and training neural nets for the remaining factor coefficients. This strategy is implemented for Meep, a popular and complex open-source electrodynamics simulation package, and are validated for heterogeneous simulations drawn from published engineering models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题