低资源分子性质预测的元学习GNN初始化

论文标题

低资源分子性质预测的元学习GNN初始化

Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction

论文作者

Nguyen, Cuong Q., Kreatsoulas, Constantine, Branson, Kim M.

论文摘要

在计算机模型中建立以预测化学性质和活动是药物发现的关键一步。但是，有限的标记数据通常会阻碍在这种情况下深度学习的应用。同时，元学习的进步已经使最先进的表演能够以少量学习为基准，自然会提出一个问题：元学习能否改善低资源药物发现项目中的深度学习表现？在这项工作中，我们评估了图形神经网络初始化的可传递性，该算法是模型不合稳定元学习（MAML）算法 - 及其变体FO-MAML和ANIL - 用于化学性质和活动任务。我们的基准使用Chembl20数据集模拟低源设置，表明，在20个分发任务中的16个中的16个和所有分发任务中的16个中，元启动的性能相当或超越了多任务的多任务预训练总线，在AUPRC中的平均改进为11.2％和26.9％和26.9％。最后，我们观察到，元定位始终导致在\ {16、32、64、128、256 \} $ instances in \ k \ in $ k \ in $ k \ in-k \ in phuntuning集的最佳模型中。

Building in silico models to predict chemical properties and activities is a crucial step in drug discovery. However, limited labeled data often hinders the application of deep learning in this setting. Meanwhile advances in meta-learning have enabled state-of-the-art performances in few-shot learning benchmarks, naturally prompting the question: Can meta-learning improve deep learning performance in low-resource drug discovery projects? In this work, we assess the transferability of graph neural networks initializations learned by the Model-Agnostic Meta-Learning (MAML) algorithm - and its variants FO-MAML and ANIL - for chemical properties and activities tasks. Using the ChEMBL20 dataset to emulate low-resource settings, our benchmark shows that meta-initializations perform comparably to or outperform multi-task pre-training baselines on 16 out of 20 in-distribution tasks and on all out-of-distribution tasks, providing an average improvement in AUPRC of 11.2% and 26.9% respectively. Finally, we observe that meta-initializations consistently result in the best performing models across fine-tuning sets with $k \in \{16, 32, 64, 128, 256\}$ instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题