论文标题
使用大型数据集预测和解释氧化物玻璃特性
Predicting and interpreting oxide glass properties by machine learning using large datasets
论文作者
论文摘要
随着强大的计算机仿真技术的出现,是时候从广泛使用的知识引导的经验方法转变为数据科学驱动的方法,主要是机器学习算法了。我们研究了三种机器学习算法的六种不同玻璃特性的预测性能。因此,我们使用了大约15万氧化物玻璃的广泛数据集,该数据集已被细分为较小的数据集,以用于所研究的每个物业。从先前对六种算法的研究中选择的决策树诱导,最新的邻居和随机森林算法,我们诱导了玻璃过渡温度,液体温度,弹性模量,热膨胀系数,折射率和ABBE的预测模型。此外,每个模型均以默认和调谐的高参数值诱导。我们证明,除了弹性模量(具有最小的训练数据集)之外,其他五个属性的诱导预测模型产生了与常规数据扩散相当的不确定性。但是,对于这些性质极低或高值的眼镜,预测不确定性明显更高。最后,正如预期的那样,在训练集中代表不佳的化学元素的眼镜产生了更高的预测误差。这里开发的方法引起人们对机器学习算法的成功和陷阱的关注。对外形值的分析表明了增加或降低建模性质值的关键要素。它还估计最大可能增加或减少。该分析获得的见解可以帮助经验构图调整和计算机辅助玻璃制剂的逆设计。
With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. We investigated the predictive performance of three machine learning algorithms for six different glass properties. For such, we used an extensive dataset of about 150,000 oxide glasses, which was segmented into smaller datasets for each property investigated. Using the decision tree induction, k-nearest neighbors, and random forest algorithms, selected from a previous study of six algorithms, we induced predictive models for glass transition temperature, liquidus temperature, elastic modulus, thermal expansion coefficient, refractive index, and Abbe number. Moreover, each model was induced with default and tuned hyperparameter values. We demonstrate that, apart from the elastic modulus (which had the smallest training dataset), the induced predictive models for the other five properties yield a comparable uncertainty to the usual data spread. However, for glasses with extremely low or high values of these properties, the prediction uncertainty is significantly higher. Finally, as expected, glasses containing chemical elements that are poorly represented in the training set yielded higher prediction errors. The method developed here calls attention to the success and possible pitfalls of machine learning algorithms. The analysis of the SHAP values indicated the key elements that increase or decrease the value of the modeled properties. It also estimated the maximum possible increase or decrease. Insights gained by this analysis can help empirical compositional tuning and computer-aided inverse design of glass formulations.