论文标题
多模式蛋白知识图构建和应用
Multi-modal Protein Knowledge Graph Construction and Applications
论文作者
论文摘要
现有的以数据为中心的蛋白质科学方法通常无法充分捕获和利用生物学知识,这对于许多蛋白质任务可能至关重要。为了促进该领域的研究,我们创建了蛋白质科学知识图Proteinkg65。我们将基因本体论和Uniprot知识基础作为基础,分别将各种知识与对齐描述和蛋白质序列分别转换为术语和蛋白质实体。 Proteinkg65主要致力于提供专门的蛋白质知识图,将基因本体论的知识带入蛋白质功能和结构预测。我们还用原型说明了Proteinkg65的潜在应用。我们的数据集可以在https://w3id.org/proteinkg65上下载。
Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descriptions and protein sequences, respectively, to GO terms and protein entities. ProteinKG65 is mainly dedicated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction. We also illustrate the potential applications of ProteinKG65 with a prototype. Our dataset can be downloaded at https://w3id.org/proteinkg65.