构建一套临床自然语言处理任务的层次结构注释：进度注意理解

论文标题

构建一套临床自然语言处理任务的层次结构注释：进度注意理解

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

论文作者

Gao, Yanjun, Dligach, Dmitriy, Miller, Timothy, Tesch, Samuel, Laffin, Ryan, Churpek, Matthew M., Afshar, Majid

论文摘要

在电子健康记录（EHR）数据上应用自然语言处理方法是一个增长的领域。现有的语料库和注释专注于建模文本特征和关系预测。但是，构建涉及文本理解，领域知识抽象和推理的过程，构建了一个注释的语料库。这项工作引入了一个分层注释模式，具有三个阶段，以解决临床文本理解，临床推理和摘要。我们基于广泛可用的每日进度注释的广泛集合创建了一个注释的语料库，这是一种以问题为导向格式的时间序列中收集的EHR文档。进度注释的常规格式遵循主观，客观，评估和计划标题（SOAP）。我们还定义了一套新的任务套件，进度注意理解，并利用三个注释阶段的三个任务。新颖的任务套件旨在训练和评估未来的NLP模型，以了解临床文本理解，临床知识表示，推理和摘要。

Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题