论文标题

通过互补特征改善分子预处理

Improving Molecular Pretraining with Complementary Featurizations

论文作者

Zhu, Yanqiao, Chen, Dingshuo, Du, Yuanqi, Wang, Yingze, Liu, Qiang, Wu, Shu

论文摘要

分子预处理学习在大量未标记数据上学习分子表示,已成为解决计算化学和药物发现中各种任务的重要范式。最近,通过不同的分子特征进行了分子预处理,包括1D微笑字符串,2D图和3D几何形状,取得了繁荣的进步。然而,分子特征及其相应的神经结构在分子预处理中的作用仍然在很大程度上尚未进行。在本文中,通过两个案例研究 - 手性分类和芳环计数 - 我们首先证明了不同的特征技术以不同的方式传达了化学信息。鉴于这种观察,我们提出了一个简单有效的分子预处理框架,并具有互补特征(MOCO)。 MOCO全面利用了多种功能,这些功能相互补充,并优于现有的最新模型,这些模型仅依赖于一两个分子属性预测任务上的一个或两个特征。

Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies -- chirality classification and aromatic ring counting -- we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely relies on one or two featurizations on a wide range of molecular property prediction tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源