论文标题

基因集接近分析:通过学习的几何嵌入扩展基因集富集分析

Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings

论文作者

Cousins, Henry, Hall, Taryn, Guo, Yinglong, Tso, Luke, Tzeng, Kathy Tzy-Hwa, Cong, Le, Altman, Russ

论文摘要

基因集分析方法依赖于基于基因集合和蛋白质 - 蛋白质相互作用(PPI)网络的基于知识的遗传相互作用的表示。遗传相互作用的显式表示通常无法捕获基因之间的复杂相互依赖性,从而限制了这种方法的分析能力。在这里,我们建议将基因集富集分析扩展到反映PPI网络拓扑的潜在特征空间,称为基因集接近分析(GSPA)。与现有方法相比,GSPA提供了提高的鉴定疾病匹配基因表达数据集中疾病相关途径的能力,同时提高了类似基因集的富集统计量的可重复性。 GSPA在统计上很简单,通过单个用户定义的参数简化为经典基因集富集。我们采用我们的方法来识别SARS-COV-2病毒进入的新型药物关联。最后,我们通过对800万患者的索赔数据进行回顾性临床分析来验证我们的药物关联预测,从而支持加巴喷丁作为危险因素和二甲双胍作为COVID-19的保护因素的角色。

Gene set analysis methods rely on knowledge-based representations of genetic interactions in the form of both gene set collections and protein-protein interaction (PPI) networks. Explicit representations of genetic interactions often fail to capture complex interdependencies among genes, limiting the analytic power of such methods. Here we propose an extension of gene set enrichment analysis to a latent feature space reflecting PPI network topology, called gene set proximity analysis (GSPA). Compared with existing methods, GSPA provides improved ability to identify disease-associated pathways in disease-matched gene expression datasets, while improving reproducibility of enrichment statistics for similar gene sets. GSPA is statistically straightforward, reducing to classical gene set enrichment through a single user-defined parameter. We apply our method to identify novel drug associations with SARS-CoV-2 viral entry. Finally, we validate our drug association predictions through retrospective clinical analysis of claims data from 8 million patients, supporting a role for gabapentin as a risk factor and metformin as a protective factor for COVID-19 hospitalization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源