论文标题

无数据发现的嵌入式物理机器学习,用于粗粒和集体变量发现

Embedded-physics machine learning for coarse-graining and collective variable discovery without data

论文作者

Schöberl, Markus, Zabaras, Nicholas, Koutsourelakis, Phaedon-Stelios

论文摘要

我们提出了一个新颖的学习框架,该框架始终嵌入潜在的物理学,同时绕过分子动力学(MD)的大多数现代,数据驱动的粗粒方法的重要缺点,即大数据的可用性。足够大的培训数据集的生成构成了一项计算要求的任务,而不能保证原子配置空间的完全覆盖范围。结果,数据驱动的粗粒模型的探索性功能受到限制,可能会产生有偏见的“预测”工具。我们提出了一个基于反向kullback-leibler差异的新颖目标,该目标完全融合了原子力场的形式。与其将模型学习与数据生成程序分开,后者依赖于模拟由力场控制的原子运动,而是在预测性粗粒模型提出的样本配置处查询原子力场。因此,学习取决于对力场的评估,但不需要任何MD模拟。由此产生的生成性粗粒模型是一种有效的替代模型,用于预测原子构型和估计相关可观察物。除了获得预测性的粗粒模型外,我们证明在发现的较低维度的表示中,集体变量(CVS)与理化的理解对于了解未开发的复杂系统至关重要。我们在预测能力和双峰势能函数和丙氨酸二肽的揭示CVS的物理含义方面展示了算法的进步。

We present a novel learning framework that consistently embeds underlying physics while bypassing a significant drawback of most modern, data-driven coarse-grained approaches in the context of molecular dynamics (MD), i.e., the availability of big data. The generation of a sufficiently large training dataset poses a computationally demanding task, while complete coverage of the atomistic configuration space is not guaranteed. As a result, the explorative capabilities of data-driven coarse-grained models are limited and may yield biased "predictive" tools. We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field. Rather than separating model learning from the data-generation procedure - the latter relies on simulating atomistic motions governed by force fields - we query the atomistic force field at sample configurations proposed by the predictive coarse-grained model. Thus, learning relies on the evaluation of the force field but does not require any MD simulation. The resulting generative coarse-grained model serves as an efficient surrogate model for predicting atomistic configurations and estimating relevant observables. Beyond obtaining a predictive coarse-grained model, we demonstrate that in the discovered lower-dimensional representation, the collective variables (CVs) are related to physicochemical properties, which are essential for gaining understanding of unexplored complex systems. We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源