论文标题

将预训练的蛋白质语言模型整合到几何深度学习网络中

Integration of Pre-trained Protein Language Models into Geometric Deep Learning Networks

论文作者

Wu, Fang, Wu, Lirong, Radev, Dragomir, Xu, Jinbo, Li, Stan Z.

论文摘要

几何深度学习最近在非欧国人领域取得了巨大的成功,并且对大型生物分子的3D结构进行学习正在成为一个独特的研究领域。但是,由于结构数据数量有限,其功效在很大程度上受到了限制。同时,经过大量1D序列训练的蛋白质语言模型已显示出迅速的功能,并具有广泛的应用。先前的几项研究考虑将这些不同的蛋白质模式结合起来,以促进几何神经网络的表示能力,但未能对其益处有全面的理解。在这项工作中,我们将训练有素的蛋白质语言模型学到的知识整合到几个最先进的几何网络中,并评估各种蛋白质表示学习基准,包括蛋白质 - 蛋白质界面预测,模型质量评估,蛋白质 - 蛋白质僵硬的刚性僵硬的模型docking和结合亲和力预测。我们的发现表明,基准的总体增长总数为20%。有力的证据表明,蛋白质语言模型的合并通过大幅度的边距增强了几何网络的能力,并且可以推广到复杂的任务。

Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several previous studies consider combining these different protein modalities to promote the representation power of geometric neural networks, but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源