论文标题

用矢量表示作为药物分类的特征技术增强分子图像

Augmenting Molecular Images with Vector Representations as a Featurization Technique for Drug Classification

论文作者

de Marchi, Daniel, Budhiraja, Amarjit

论文摘要

为药物分类和产生建立深度学习系统的关键步骤之一是分子的特征化选择。先前的特征化方法包括分子图像,二进制字符串,图形和微笑字符串。本文提出了用二进制向量标题为标题的分子图像创建,这些载体编码不包含或仅从分子图像中理解的信息。具体而言,我们使用Morgan指纹,该指纹编码更高级别的结构信息,以及MACCS键,该密钥编码是或拒绝分子属性和结构的问题。我们在Pande Lab发表的HIV数据集上测试了我们的方法,该方法由41,127个分子组成,该分子被标记为抑制HIV病毒。我们的最终模型在HIV数据集上实现了AUC ROC的状态,表现优于所有其他方法。此外,该模型的收敛速度明显比大多数其他方法快得多,比未夸大的图像所需的计算能力大大少。

One of the key steps in building deep learning systems for drug classification and generation is the choice of featurization for the molecules. Previous featurization methods have included molecular images, binary strings, graphs, and SMILES strings. This paper proposes the creation of molecular images captioned with binary vectors that encode information not contained in or easily understood from a molecular image alone. Specifically, we use Morgan fingerprints, which encode higher level structural information, and MACCS keys, which encode yes or no questions about a molecules properties and structure. We tested our method on the HIV dataset published by the Pande lab, which consists of 41,127 molecules labeled by if they inhibit the HIV virus. Our final model achieved a state of the art AUC ROC on the HIV dataset, outperforming all other methods. Moreover, the model converged significantly faster than most other methods, requiring dramatically less computational power than unaugmented images.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源