论文标题

低资源分子性质预测的元学习GNN初始化

Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction

论文作者

Nguyen, Cuong Q., Kreatsoulas, Constantine, Branson, Kim M.

论文摘要

在计算机模型中建立以预测化学性质和活动是药物发现的关键一步。但是,有限的标记数据通常会阻碍在这种情况下深度学习的应用。同时,元学习的进步已经使最先进的表演能够以少量学习为基准,自然会提出一个问题:元学习能否改善低资源药物发现项目中的深度学习表现?在这项工作中,我们评估了图形神经网络初始化的可传递性,该算法是模型不合稳定元学习(MAML)算法 - 及其变体FO-MAML和ANIL - 用于化学性质和活动任务。我们的基准使用Chembl20数据集模拟低源设置,表明,在20个分发任务中的16个中的16个和所有分发任务中的16个中,元启动的性能相当或超越了多任务的多任务预训练总线,在AUPRC中的平均改进为11.2%和26.9%和26.9%。最后,我们观察到,元定位始终导致在\ {16、32、64、128、256 \} $ instances in \ k \ in $ k \ in $ k \ in-k \ in phuntuning集的最佳模型中。

Building in silico models to predict chemical properties and activities is a crucial step in drug discovery. However, limited labeled data often hinders the application of deep learning in this setting. Meanwhile advances in meta-learning have enabled state-of-the-art performances in few-shot learning benchmarks, naturally prompting the question: Can meta-learning improve deep learning performance in low-resource drug discovery projects? In this work, we assess the transferability of graph neural networks initializations learned by the Model-Agnostic Meta-Learning (MAML) algorithm - and its variants FO-MAML and ANIL - for chemical properties and activities tasks. Using the ChEMBL20 dataset to emulate low-resource settings, our benchmark shows that meta-initializations perform comparably to or outperform multi-task pre-training baselines on 16 out of 20 in-distribution tasks and on all out-of-distribution tasks, providing an average improvement in AUPRC of 11.2% and 26.9% respectively. Finally, we observe that meta-initializations consistently result in the best performing models across fine-tuning sets with $k \in \{16, 32, 64, 128, 256\}$ instances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源