论文标题

面具:一个灵活的框架,以促进临床文本的去识别

MASK: A flexible framework to facilitate de-identification of clinical texts

论文作者

Milosevic, Nikola, Kalappa, Gangamma, Dadafarin, Hesam, Azimaee, Mahmoud, Nenadic, Goran

论文摘要

医疗健康记录和临床摘要包含大量重要信息,以文本形式有助于推进治疗,药物和公共卫生的研究。但是,这些信息中的大多数没有共享,因为它们包含有关患者,家人或医务人员对待他们的私人信息。诸如美国HIPPA,加拿大的Phippa和GDPR等法规规定了此信息的保护,处理和分发。如果这些信息被取消确定并替换或编辑,则可以将其分发给研究社区。在本文中,我们提出了面具,该软件包旨在执行DE-INSIFIFIECT任务。该软件能够使用一些最先进的技术,然后再使用蒙版或redact公认的实体执行命名实体识别。用户能够选择命名的实体识别算法(当前实施是基于CRF的技术的两个版本和带有预训练的手套和Elmo嵌入的基于Bilstm的神经网络)和掩盖算法(例如,移动日期,替换名称/位置,完全red redact Entity)。

Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada and GDPR regulate the protection, processing and distribution of this information. In case this information is de-identified and personal information are replaced or redacted, they could be distributed to the research community. In this paper, we present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities. The user is able to select named entity recognition algorithm (currently implemented are two versions of CRF-based techniques and BiLSTM-based neural network with pre-trained GLoVe and ELMo embedding) and masking algorithm (e.g. shift dates, replace names/locations, totally redact entity).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源