论文标题

临床文本的多发性硬化严重程度分类

Multiple Sclerosis Severity Classification From Clinical Text

论文作者

Costa, Alister D, Denkovski, Stefan, Malyska, Michal, Moon, Sae Young, Rufino, Brandon, Yang, Zhen, Killian, Taylor, Ghassemi, Marzyeh

论文摘要

多发性硬化症(MS)是一种慢性,炎症性和退化性神经系统疾病,使用专家使用扩展的残疾状态量表(EDSS)对其进行监测,并以神经病学咨询的形式记录在非结构化文本中。 EDSS测量包含总体“ EDSS”评分和几个功能性亚库。通常,需要专家知识来解释咨询笔记并产生这些分数。先前的方法使用有限的上下文长度Word2Vec嵌入和关键字搜索来预测给定注释的分数,但是当未明确说明分数时,通常会失败。在这项工作中,我们介绍了MS-Bert,这是第一个公开可用的变压器模型,该模型培训了Mimic以外的实际临床数据。接下来,我们提出MSBC,该分类器应用于MS-BERT生成嵌入并预测EDSS和功能性亚库。最后,我们通过使用浮潜来生成未标记咨询笔记的分数来探索MSBC与其他模型的结合。 MSBC在所有指标和预测任务上实现最先进的性能,并胜过浮潜合奏产生的模型。对于预测EDSS的预测EDSS,我们将宏F1提高了0.12(至0.88),平均将0.29(升至0.63),以预测以前的Word2Vec CNN和基于规则的方法的功能性subscores。

Multiple Sclerosis (MS) is a chronic, inflammatory and degenerative neurological disease, which is monitored by a specialist using the Expanded Disability Status Scale (EDSS) and recorded in unstructured text in the form of a neurology consult note. An EDSS measurement contains an overall "EDSS" score and several functional subscores. Typically, expert knowledge is required to interpret consult notes and generate these scores. Previous approaches used limited context length Word2Vec embeddings and keyword searches to predict scores given a consult note, but often failed when scores were not explicitly stated. In this work, we present MS-BERT, the first publicly available transformer model trained on real clinical data other than MIMIC. Next, we present MSBC, a classifier that applies MS-BERT to generate embeddings and predict EDSS and functional subscores. Lastly, we explore combining MSBC with other models through the use of Snorkel to generate scores for unlabelled consult notes. MSBC achieves state-of-the-art performance on all metrics and prediction tasks and outperforms the models generated from the Snorkel ensemble. We improve Macro-F1 by 0.12 (to 0.88) for predicting EDSS and on average by 0.29 (to 0.63) for predicting functional subscores over previous Word2Vec CNN and rule-based approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源