用内核方法对粗粒分子动力学场的合奏学习

论文标题

用内核方法对粗粒分子动力学场的合奏学习

Ensemble Learning of Coarse-Grained Molecular Dynamics Force Fields with a Kernel Approach

论文作者

Wang, Jiang, Chmiela, Stefan, Müller, Klaus-Robert, Noè, Frank, Clementi, Cecilia

论文摘要

梯度域机器学习（GDML）是基于内核脊回归算法学习分子电位和相关力场的准确有效方法。在这里，我们演示了其以样本有效方式从全原子模拟数据中学习有效的粗粒（CG）模型的应用。通过遵循热力学一致性原理来学习粗粒的力场，在此，通过将预测的粗粒力与粗粒坐标中的全原子平均力之间的误差最小化。直接解决GDML解决此问题是不可能的，因为粗粒需要在许多训练数据点上平均，从而导致存储内核矩阵的不切实际记忆要求。在这项工作中，我们提出了一个数据效率和节省内存的替代方案。使用集合学习和分层抽样，我们提出了一种2层训练方案，该方案使GDML能够学习有效的粗粒模型。我们通过重建该分子的粗粒度变体的自由能景观来说明我们在简单的生物分子系统（丙氨酸二肽）上说明我们的方法。当训练集很小时，我们的新型GDML训练方案比神经网络会产生比神经网络更小的自由能误差，并且当训练组足够大时，精度相当高。

Gradient-domain machine learning (GDML) is an accurate and efficient approach to learn a molecular potential and associated force field based on the kernel ridge regression algorithm. Here, we demonstrate its application to learn an effective coarse-grained (CG) model from all-atom simulation data in a sample efficient manner. The coarse-grained force field is learned by following the thermodynamic consistency principle, here by minimizing the error between the predicted coarse-grained force and the all-atom mean force in the coarse-grained coordinates. Solving this problem by GDML directly is impossible because coarse-graining requires averaging over many training data points, resulting in impractical memory requirements for storing the kernel matrices. In this work, we propose a data-efficient and memory-saving alternative. Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective coarse-grained model. We illustrate our method on a simple biomolecular system, alanine dipeptide, by reconstructing the free energy landscape of a coarse-grained variant of this molecule. Our novel GDML training scheme yields a smaller free energy error than neural networks when the training set is small, and a comparably high accuracy when the training set is sufficiently large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题