论文标题
用于计算药物重新定位的标签稀疏性的自学学习
Self-supervised Learning for Label Sparsity in Computational Drug Repositioning
论文作者
论文摘要
计算药物重新定位旨在发现市场药物的新用途,该药物可以加速药物开发过程并在现有的药物发现系统中发挥重要作用。但是,与现实世界中的毒品和疾病数量相比,经过验证的药物疾病关联的数量很少。标签样品很少会使分类模型无法学习有效的药物潜在因素,从而导致泛化性能差。在这项工作中,我们为计算药物重新定位提出了一个多任务自制学习框架。该框架通过学习更好的药物表示来解决标签稀疏性。具体而言,我们将药物疾病协会的预测问题作为主要任务,而辅助任务是使用数据增强策略并进行对比学习来挖掘原始药物特征的内部关系,以便在没有监督标签的情况下自动学习更好的药物表示。通过联合培训,可以确保辅助任务可以提高主要任务的预测准确性。更确切地说,辅助任务改善了药物表示,并作为额外的正则化来改善概括。此外,我们设计了一个多输入解码网络,以提高自动编码器模型的重建能力。我们使用三个现实世界数据集评估了我们的模型。实验结果证明了多任务自我监督学习框架的有效性,其预测能力优于最新模型。
The computational drug repositioning aims to discover new uses for marketed drugs, which can accelerate the drug development process and play an important role in the existing drug discovery system. However, the number of validated drug-disease associations is scarce compared to the number of drugs and diseases in the real world. Too few labeled samples will make the classification model unable to learn effective latent factors of drugs, resulting in poor generalization performance. In this work, we propose a multi-task self-supervised learning framework for computational drug repositioning. The framework tackles label sparsity by learning a better drug representation. Specifically, we take the drug-disease association prediction problem as the main task, and the auxiliary task is to use data augmentation strategies and contrast learning to mine the internal relationships of the original drug features, so as to automatically learn a better drug representation without supervised labels. And through joint training, it is ensured that the auxiliary task can improve the prediction accuracy of the main task. More precisely, the auxiliary task improves drug representation and serving as additional regularization to improve generalization. Furthermore, we design a multi-input decoding network to improve the reconstruction ability of the autoencoder model. We evaluate our model using three real-world datasets. The experimental results demonstrate the effectiveness of the multi-task self-supervised learning framework, and its predictive ability is superior to the state-of-the-art model.