通过自动弱监督对非结构化临床笔记进行分类

论文标题

通过自动弱监督对非结构化临床笔记进行分类

Classifying Unstructured Clinical Notes via Automatic Weak Supervision

论文作者

Gao, Chufan, Goswami, Mononito, Chen, Jieshi, Dubrawski, Artur

论文摘要

医疗保健提供者通常记录提供给每个患者的临床护理的详细说明，以进行临床，研究和计费目的。由于这些叙述的非结构性性质，提供者使用专门的员工使用国际疾病（ICD）编码系统为患者的诊断分配诊断代码。此手动过程不仅耗时，而且昂贵且容易出错。先前的工作证明了机器学习（ML）方法在自动化此过程中的潜在效用，但它依靠大量手动标记的数据来训练模型。此外，诊断编码系统随着时间的流逝而发展，这使得传统的监督学习策略无法推广到本地应用程序之外。在这项工作中，我们引入了一个普遍的弱监督文本分类框架，该框架仅从类标签描述中学习，而无需使用任何人类标记的文档。它利用预先训练的语言模型和数据编程框架中存储的语言领域知识将代码标签分配给各个文本。我们通过将方法与四个现实世界文本分类数据集的最先进的弱文本分类器进行比较，除了将ICD代码分配给公开可用的MIMICIII II II数据库中的医疗注释外，我们证明了我们的方法的疗效和灵活性。

Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML) methodology in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications. In this work, we introduce a general weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to individual texts. We demonstrate the efficacy and flexibility of our method by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets, in addition to assigning ICD codes to medical notes in the publicly available MIMIC-III database.

下载PDF全文

下载文献需遵守相关版权规定

论文标题