论文标题
伯特学习(和教)化学
BERT Learns (and Teaches) Chemistry
论文作者
论文摘要
现代的计算有机化学正在越来越多地数据驱动。在该领域中仍然存在许多重要的未解决问题,例如给定反应物,药物发现和公制分子合成的产品预测,但是近年来,使用机器学习解决这些问题的努力也有所增加。在这项工作中,我们提出了从数据驱动的角度研究官能团和其他影响分子的分子子结构的使用,并使用基于变压器的模型(BERT)在分子字符串表示的数据集中,并分析其注意力头的行为。然后,我们应用模型学到的官能团和原子的表示,以解决较小数据集上的毒性,溶解性,类似药物和合成性的问题,使用学到的表示作为图形卷积和注意力模型的特征,以及BERT的微调。最后,我们建议将注意力可视化用作化学从业者和学生快速识别各种化学特性中重要的子结构的有用工具。
Modern computational organic chemistry is becoming increasingly data-driven. There remain a large number of important unsolved problems in this area such as product prediction given reactants, drug discovery, and metric-optimized molecule synthesis, but efforts to solve these problems using machine learning have also increased in recent years. In this work, we propose the use of attention to study functional groups and other property-impacting molecular substructures from a data-driven perspective, using a transformer-based model (BERT) on datasets of string representations of molecules and analyzing the behavior of its attention heads. We then apply the representations of functional groups and atoms learned by the model to tackle problems of toxicity, solubility, drug-likeness, and synthesis accessibility on smaller datasets using the learned representations as features for graph convolution and attention models on the graph structure of molecules, as well as fine-tuning of BERT. Finally, we propose the use of attention visualization as a helpful tool for chemistry practitioners and students to quickly identify important substructures in various chemical properties.