论文标题
相结合自动化和机器学习技术的化学结构极光关系的高通量发现
High-throughput discovery of chemical structure-polarity relationships combining automation and machine learning techniques
论文作者
论文摘要
作为有机化合物的基本属性,极性对许多分子特性(例如溶解度和相变温度)具有深远的影响。薄层色谱法(TLC)代表一种用于极性测量的常用技术。但是,当前的TLC分析提出了几个问题,包括需要大量尝试获得合适条件的尝试以及由于非标准化而导致的不可培养。在此,我们描述了用于TLC分析的自动化实验系统。该系统旨在自动进行TLC分析,从而通过在标准化条件下收集大型实验数据来促进高通量实验。使用这些数据集,使用机器学习(ML)方法来构建使用延迟因子(RF)将有机化合物的结构及其极性相关联的替代模型。训练有素的ML模型能够以高精度预测有机化合物的RF值曲线。此外,还可以通过这些建模方法发现化合物及其极性之间的构型关系,并且通过吸附理论合理化了基本机制。受过训练的ML模型不仅减少了当前对TLC分析所需的经验优化的需求,而且还为选择条件的一般指南提供了一般指南,使TLC成为广泛科学社区的易于访问的工具。
As an essential attribute of organic compounds, polarity has a profound influence on many molecular properties such as solubility and phase transition temperature. Thin layer chromatography (TLC) represents a commonly used technique for polarity measurement. However, current TLC analysis presents several problems, including the need for a large number of attempts to obtain suitable conditions, as well as irreproducibility due to non-standardization. Herein, we describe an automated experiment system for TLC analysis. This system is designed to conduct TLC analysis automatically, facilitating high-throughput experimentation by collecting large experimental data under standardized conditions. Using these datasets, machine learning (ML) methods are employed to construct surrogate models correlating organic compounds' structures and their polarity using retardation factor (Rf). The trained ML models are able to predict the Rf value curve of organic compounds with high accuracy. Furthermore, the constitutive relationship between the compound and its polarity can also be discovered through these modeling methods, and the underlying mechanism is rationalized through adsorption theories. The trained ML models not only reduce the need for empirical optimization currently required for TLC analysis, but also provide general guidelines for the selection of conditions, making TLC an easily accessible tool for the broad scientific community.