使用上下文临床BERT嵌入了解患者的投诉特征

论文标题

使用上下文临床BERT嵌入了解患者的投诉特征

Understanding patient complaint characteristics using contextual clinical BERT embeddings

论文作者

Saha, Budhaditya, Lisboa, Sanal, Ghosh, Shameek

论文摘要

在临床对话应用中，提取的实体倾向于捕获患者投诉的主要主题，即症状或疾病。但是，他们主要无法认识到投诉的特征，例如时间，发作和严重性。例如，如果输入是“我有头痛，它是极端的”，则最新模型只能识别主要症状实体 - 头痛，但忽略了“极端”的严重性因素，这表征了头痛。在本文中，我们设计了一种两阶段的方法来检测实体的特征，例如普通用户在将其症状描述给临床医生的情况下提出的症状。我们使用Word2Vec和Bert来编码患者给出的临床文本。我们将输出转换为多标签分类问题，并将任务重新构架。最后，我们将处理的编码与线性判别分析（LDA）算法相结合，以对主要实体的特征进行分类。实验结果表明，我们的方法在最新模型的准确性方面提高了40-50％。

In clinical conversational applications, extracted entities tend to capture the main subject of a patient's complaint, namely symptoms or diseases. However, they mostly fail to recognize the characterizations of a complaint such as the time, the onset, and the severity. For example, if the input is "I have a headache and it is extreme", state-of-the-art models only recognize the main symptom entity - headache, but ignore the severity factor of "extreme", that characterizes headache. In this paper, we design a two-stage approach to detect the characterizations of entities like symptoms presented by general users in contexts where they would describe their symptoms to a clinician. We use Word2Vec and BERT to encode clinical text given by the patients. We transform the output and re-frame the task as multi-label classification problem. Finally, we combine the processed encodings with the Linear Discriminant Analysis (LDA) algorithm to classify the characterizations of the main entity. Experimental results demonstrate that our method achieves 40-50% improvement on the accuracy over the state-of-the-art models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题