论文标题
结构质体图与内核主协变量回归
Structure-Property Maps with Kernel Principal Covariates Regression
论文作者
论文摘要
基于线性方法的数据分析构成了自动处理大量数据以构建监督或无监督的机器学习模型的最简单,最健壮和透明的方法。主协变量回归(PCOVR)是一种未经充分的方法,它在主成分分析和线性回归之间进行了插值,可以用简单间隔,低维地图来方便地揭示结构 - 性关系。在这里,我们提供了这些数据分析方案的教学概述,包括使用内核技巧引入非线性元素,同时保持大多数便利性和线性方法的简单性。然后,我们引入了PCOVR的内核版本和稀疏扩展,并在揭示和预测化学和材料科学中的结构 - 特质关系方面的性能,展示了各种示例,包括元素碳,多孔硅酸盐框架,有机分子,有机分子,氨基酸构型和分子材料。
Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview of these data analysis schemes, including the use of the kernel trick to introduce an element of non-linearity, while maintaining most of the convenience and the simplicity of linear approaches. We then introduce a kernelized version of PCovR and a sparsified extension, and demonstrate the performance of this approach in revealing and predicting structure-property relations in chemistry and materials science, showing a variety of examples including elemental carbon, porous silicate frameworks, organic molecules, amino acid conformers, and molecular materials.