论文标题
两个错误可以做出正确的正确:一种化学发现的转移学习方法
Two Wrongs Can Make a Right: A Transfer Learning Approach for Chemical Discovery with Chemical Accuracy
论文作者
论文摘要
适当地识别和处理具有显着多种参考(MR)特征的分子和材料对于在虚拟高吞吐量筛选(VHTS)中实现高数据保真度至关重要。然而,大多数VHT是使用单个功能以近似密度函数理论(DFT)进行的。尽管发展了许多MR诊断,但此类诊断的单个值表明MR对化学性质预测的影响尚未得到很好的确定。我们评估了10,000多个过渡金属复合物(TMC)的MR诊断,并与有机分子中的MR诊断相比。我们透露,只有一些MR诊断可以在这些材料空间中转移。通过研究MR特性对涉及多个势能表面的化学性质(即MR效应)的影响(即绝热旋转拆分,$ΔE_\ MATHRM {H-L} $和电离电位,IP),我们观察到在MR效应中取消的MR效应超过积累。在预测财产预测中MR效应时,MR特征的差异比MR特征的总数更重要。在此观察结果的推动下,我们构建了转移学习模型,以直接预测CCSD(T) - 绝热$ΔE_\ MATHRM {H-L} $和IP从较低的理论级别。通过将这些模型与不确定性定量和多层次建模相结合,我们引入了一种多管齐下的策略,该策略将数据采集至少三倍,同时实现了可靠的VHT的化学准确性(即1 kcal/mol)。
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high throughput screening (VHTS). Nevertheless, most VHTS is carried out with approximate density functional theory (DFT) using a single functional. Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates MR effect on chemical property prediction is not well established. We evaluate MR diagnostics of over 10,000 transition metal complexes (TMCs) and compare to those in organic molecules. We reveal that only some MR diagnostics are transferable across these materials spaces. By studying the influence of MR character on chemical properties (i.e., MR effect) that involves multiple potential energy surfaces (i.e., adiabatic spin splitting, $ΔE_\mathrm{H-L}$, and ionization potential, IP), we observe that cancellation in MR effect outweighs accumulation. Differences in MR character are more important than the total degree of MR character in predicting MR effect in property prediction. Motivated by this observation, we build transfer learning models to directly predict CCSD(T)-level adiabatic $ΔE_\mathrm{H-L}$ and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving chemical accuracy (i.e., 1 kcal/mol) for robust VHTS.