论文标题

开发具有多回答和多重点问题的提取性临床问题回答数据集

Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

论文作者

Moon, Sungrim, He, Huan, Liu, Hongfang, Fan, Jungwei W.

论文摘要

背景:提取问题避开(EQA)是一种有用的自然语言处理(NLP)申请,用于通过在其临床注释中找到答案来回答患者特定问题。现实的临床EQA可以在一个问题中对单个问题和多个重点点有多个答案,而这些问题在现有的数据集中缺乏用于人工智能解决方案的开发。目的:创建一个数据集,用于开发和评估可以处理自然多回答和多聚焦问题的临床EQA系统。方法:我们利用了2018年国家NLP临床挑战(N2C2)语料库的注释关系来生成EQA数据集。具体而言,包括1-to-n,M-1-1和M到N的药物关系,以形成多回答和多对焦质量质量质量质量质量药品,除了基本的单毒一级案例外,它们还代表了更复杂和自然的挑战。开发了基线解决方案并在数据集上进行了测试。结果:派生的RXWHYQA数据集包含96,939个QA条目。在可回答的问题中,有25%需要多个答案,有2%的人在一个问题中询问多种药物。经常在文本中的答案周围观察到,其中90%的药物和原因术语出现在相同或相邻的句子内。基线EQA解决方案在整个数据集上达到了0.72的最佳F1量度,并且在特定的子集上,在无法回答的问题上达到:0.93,单盘问题上的0.48在单盘问题上为0.60,而在多重毒品问题上为0.60,在多毒品问题上,在单人问题上,在单次问题上,在多 - 答案问题上为0.43,在单一问题上进行了0.43。讨论:RXWHYQA数据集可用于培训和评估需要处理多回答和多聚焦问题的系统。具体而言,多回答EQA似乎具有挑战性,因此值得对研究进行更多的投资。

Background: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions. Objective: Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions. Methods: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic one-drug-one-reason cases. A baseline solution was developed and tested on the dataset. Results: The derived RxWhyQA dataset contains 96,939 QA entries. Among the answerable questions, 25% require multiple answers, and 2% ask about multiple drugs within one question. There are frequent cues observed around the answers in the text, and 90% of the drug and reason terms occur within the same or an adjacent sentence. The baseline EQA solution achieved a best f1-measure of 0.72 on the entire dataset, and on specific subsets, it was: 0.93 on the unanswerable questions, 0.48 on single-drug questions versus 0.60 on multi-drug questions, 0.54 on the single-answer questions versus 0.43 on multi-answer questions. Discussion: The RxWhyQA dataset can be used to train and evaluate systems that need to handle multi-answer and multi-focus questions. Specifically, multi-answer EQA appears to be challenging and therefore warrants more investment in research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源