迈向线性缩放和化学精确的全球机器学习力场的大分子

论文标题

迈向线性缩放和化学精确的全球机器学习力场的大分子

Towards Linearly Scaling and Chemically Accurate Global Machine Learning Force Fields for Large Molecules

论文作者

Kabylda, Adil, Vassilev-Galindo, Valentin, Chmiela, Stefan, Poltavsky, Igor, Tkatchenko, Alexandre

论文摘要

机器学习力场（MLFF）逐渐发展为以较低的精度对分子和材料进行分子动力学模拟，但在计算成本的一小部分中。 However, several challenges remain to be addressed to enable predictive MLFF simulations of realistic molecules, including: (1) developing efficient descriptors for non-local interatomic interactions, which are essential to capture long-range molecular fluctuations, and (2) reducing the dimensionality of the descriptors in kernel methods (or a number of parameters in neural networks) to enhance the applicability and interpretability of MLFFs.在这里，我们提出了一种自动化的方法，以大大减少原子间描述符特征的数量，同时保留准确性并提高MLFF的效率。为了同时解决这两个既定的挑战，我们以全球GDML MLFF的示例说明了我们的方法；但是，我们的方法可以同样应用于其他模型。我们发现，在研究系统中，非本地特征（在研究系统中分开15 $ 〜〜 $）对于保留MLFF的整体准确性至关重要，对于肽，DNA碱基对，脂肪酸和超分子复合物至关重要。有趣的是，还原描述符中所需的非本地特征的数量与当地的原子间特征（低于5 $〜$ $）的数量相当。这些结果为构建全球分子MLFF的方式铺平了道路，这些MLFF的成本随着系统大小而线性而不是四边形。

Machine learning force fields (MLFFs) are gradually evolving towards enabling molecular dynamics simulations of molecules and materials with ab initio accuracy but at a small fraction of the computational cost. However, several challenges remain to be addressed to enable predictive MLFF simulations of realistic molecules, including: (1) developing efficient descriptors for non-local interatomic interactions, which are essential to capture long-range molecular fluctuations, and (2) reducing the dimensionality of the descriptors in kernel methods (or a number of parameters in neural networks) to enhance the applicability and interpretability of MLFFs. Here we propose an automatized approach to substantially reduce the number of interatomic descriptor features while preserving the accuracy and increasing the efficiency of MLFFs. To simultaneously address the two stated challenges, we illustrate our approach on the example of the global GDML MLFF; however, our methodology can be equally applied to other models. We found that non-local features (atoms separated by as far as 15$~Å$ in studied systems) are crucial to retain the overall accuracy of the MLFF for peptides, DNA base pairs, fatty acids, and supramolecular complexes. Interestingly, the number of required non-local features in the reduced descriptors becomes comparable to the number of local interatomic features (those below 5$~Å$). These results pave the way to constructing global molecular MLFFs whose cost increases linearly, instead of quadratically, with system size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题