论文标题
Neurahealth:通过深度学习和自然语言处理的电子健康记录中未诊断的认知障碍的自动筛选管道
NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing
论文作者
论文摘要
与痴呆症相关的认知障碍(CI)是一种神经退行性疾病,在全球范围内影响超过5500万人,并且每3秒每3秒就以一个新病例的速度迅速增长。在全球范围内,有75%的病例在全球范围内未诊断,在低和中等收入国家中,案件高达90%,导致全球年度估计成本为1.3万亿美元,预计到2030年,预计将达到2.8万亿美元。没有治愈,没有治愈,临床试验的反复出现,并且缺乏早期诊断率,死亡率为100%。电子健康记录中的信息(EHR)可以为早期检测CI提供重要线索,但是专家的手动审查繁琐且容易出错。但是,已经提出了几种计算方法,但是,在EHR的复杂语言结构中,他们缺乏对语言环境的增强理解。因此,我提出了一个新颖,更准确的框架,即Neurahealth,以识别没有较早诊断的患者。在Neurahealth中,我使用大规模杨百年生生物库的患者EHR,对基于双向注意的深度学习自然语言处理模型进行了微调,以对序列进行分类。序列预测用于生成结构化特征作为患者级别正规逻辑回归模型的输入。这个两步的框架创造了很高的维度,超过了所有现有的最新计算方法以及临床方法。此外,我将这些模型集成到现实世界中的Web应用程序中,以创建自动化的EHR筛选管道,以在EHR中可扩展和高速发现未发现的CI,从而使医疗机构和稀缺健康服务的地区可行。
Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurring failure of clinical trials, and a lack of early diagnosis, the mortality rate is 100%. Information in electronic health records (EHR) can provide vital clues for early detection of CI, but a manual review by experts is tedious and error prone. Several computational methods have been proposed, however, they lack an enhanced understanding of the linguistic context in complex language structures of EHR. Therefore, I propose a novel and more accurate framework, NeuraHealth, to identify patients who had no earlier diagnosis. In NeuraHealth, using patient EHR from Mass General Brigham BioBank, I fine-tuned a bi-directional attention-based deep learning natural language processing model to classify sequences. The sequence predictions were used to generate structured features as input for a patient level regularized logistic regression model. This two-step framework creates high dimensionality, outperforming all existing state-of-the-art computational methods as well as clinical methods. Further, I integrate the models into a real-world product, a web app, to create an automated EHR screening pipeline for scalable and high-speed discovery of undetected CI in EHR, making early diagnosis viable in medical facilities and in regions with scarce health services.