MEDML：融合早期儿科Covid-19的医学知识和机器学习模型19号住院和严重性预测

论文标题

MEDML：融合早期儿科Covid-19的医学知识和机器学习模型19号住院和严重性预测

MedML: Fusing Medical Knowledge and Machine Learning Models for Early Pediatric COVID-19 Hospitalization and Severity Prediction

论文作者

Gao, Junyi, Yang, Chaoqi, Heintz, George, Barrows, Scott, Albers, Elise, Stapel, Mary, Warfield, Sara, Cross, Adam, Sun, Jimeng, consortium, the N3C

论文摘要

共同19-19大流行造成了毁灭性的经济和社会破坏，使全球医疗机构的资源紧张。这导致全国范围内呼吁模型预测Covid-19患者的住院和严重疾病，以告知有限医疗资源的分配。我们回应针对儿科人群的其中一种。为了应对这一挑战，我们使用电子健康记录研究了针对儿科人群的两项预测任务：1）预测哪些儿童更有可能住院，而2）在住院儿童中，哪些儿童更有可能出现严重的症状。我们通过新颖的机器学习模型MedML应对国家儿科COVID-19数据挑战。 MedML根据超过600万个医学概念的医学知识和倾向分数提取了最预测的特征，并通过图神经网络（GNN）结合了异质医学特征之间的功能间关系。我们使用国家队列协作（N3C）数据集中的数据评估了143,605名患者的MEDML，以进行住院预测任务和11,465名患者的严重性预测任务。我们还报告了详细的小组级别和个人级特征的重要性分析，以评估模型的解释性。与最佳的基线机器学习模型相比，MEDML的AUROC得分高达7％，AUPRC得分高达14％，并且自大流行以来的所有九个国家地理区域以及所有三个月的跨度都表现良好。我们的跨学科研究团队开发了一种将临床领域知识纳入新型机器学习模型的框架的方法，该框架比当前最新的数据驱动的功能选择方法更具预测性和可解释。

The COVID-19 pandemic has caused devastating economic and social disruption, straining the resources of healthcare institutions worldwide. This has led to a nationwide call for models to predict hospitalization and severe illness in patients with COVID-19 to inform distribution of limited healthcare resources. We respond to one of these calls specific to the pediatric population. To address this challenge, we study two prediction tasks for the pediatric population using electronic health records: 1) predicting which children are more likely to be hospitalized, and 2) among hospitalized children, which individuals are more likely to develop severe symptoms. We respond to the national Pediatric COVID-19 data challenge with a novel machine learning model, MedML. MedML extracts the most predictive features based on medical knowledge and propensity scores from over 6 million medical concepts and incorporates the inter-feature relationships between heterogeneous medical features via graph neural networks (GNN). We evaluate MedML across 143,605 patients for the hospitalization prediction task and 11,465 patients for the severity prediction task using data from the National Cohort Collaborative (N3C) dataset. We also report detailed group-level and individual-level feature importance analyses to evaluate the model interpretability. MedML achieves up to a 7% higher AUROC score and up to a 14% higher AUPRC score compared to the best baseline machine learning models and performs well across all nine national geographic regions and over all three-month spans since the start of the pandemic. Our cross-disciplinary research team has developed a method of incorporating clinical domain knowledge as the framework for a new type of machine learning model that is more predictive and explainable than current state-of-the-art data-driven feature selection methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题