论文标题
分子属性预测的结构化多任务学习
Structured Multi-task Learning for Molecular Property Prediction
论文作者
论文摘要
分子财产预测的多任务学习在药物发现中变得越来越重要。但是,与其他领域相反,在药物发现中多任务学习的性能仍然无法满足,因为每个任务的标记数据数量过于有限,这要求提供其他数据以补充数据稀缺性。在本文中,我们研究了在新的环境中进行分子属性预测的多任务学习,其中可以使用任务之间的关系图。我们首先构建一个数据集(Chembl-string),包括大约400个任务以及一个任务关系图。然后,为了更好地利用这种关系图,我们提出了一种称为SGNN-EBM的方法,以从两个角度系统地研究结构化任务建模。 (1)在\ emph {litent}空间中,我们通过在关系图上应用状态图神经网络(SGNN)来建模任务表示。 (2)在\ emph {output}空间中,我们使用基于能量的模型(EBM)采用结构化预测,可以通过噪声对抗性估计(NCE)方法有效训练。经验结果证明了SGNN-EBM的有效性。代码可在https://github.com/chao1224/sgnn-ebm上找到。
Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available. We first construct a dataset (ChEMBL-STRING) including around 400 tasks as well as a task relation graph. Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives. (1) In the \emph{latent} space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph. (2) In the \emph{output} space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach. Empirical results justify the effectiveness of SGNN-EBM. Code is available on https://github.com/chao1224/SGNN-EBM.