persgnn：将拓扑数据分析和几何深度学习应用于基于结构的蛋白质功能预测

论文标题

persgnn：将拓扑数据分析和几何深度学习应用于基于结构的蛋白质功能预测

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction

论文作者

Swenson, Nicolas, Krishnapriyan, Aditi S., Buluc, Aydin, Morozov, Dmitriy, Yelick, Katherine

论文摘要

了解蛋白质结构 - 功能关系是计算生物学的关键挑战，以及在生物技术和制药行业之间的应用。虽然众所周知，蛋白质结构直接影响蛋白质功能，但许多功能预测任务仅使用蛋白质序列。在这项工作中，我们分离蛋白质结构以对蛋白质数据库中蛋白质的功能注释进行功能注释，以研究不同基于结构的预测方案的表现力。我们提出了Persgnn-一种可端到端的可训练深度学习模型，将图形表示学习与拓扑数据分析相结合，以捕获一组复杂的本地和全球结构特征。尽管这些技术的变化以前已成功地应用于蛋白质，但我们证明了我们的杂交方法，persgnn，persgnn本身都优于方法，以及从相同信息中学习的基线神经网络。与最佳的个体模型相比，Persgnn在精确回忆曲线（AUPR）下的面积增长了9.3％，并且在不同基因本体学类别中的高F1分数表明这种方法的转移性。

Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题